in reply to =~ matches non-existent symbols

A wild guess. Could this be OS dependent? Losedows uses CRLF to denote the end of a line, while *u*x (I believe) uses only CR. Thus a file created in Losedows and parsed on Linux might have a LF character that wasn't expected. I don't know if something comparable might happen at the end of a file. Try cutting your file down (half the size each time would be normal) until you get the bit that causes the problem. Then print out the problem bit in delimiters like angle brackets. You might also print out the length of that string. This may help you see what characters are really there, not just what characters you can see.

Regards,

John Davies

Replies are listed 'Best First'.
Re^2: =~ matches non-existent symbols
by Anonymous Monk on Nov 16, 2014 at 20:45 UTC
    Try cutting your file down (half the size each time would be normal) until you get the bit that causes the problem. Then print out the problem bit in delimiters like angle brackets. You might also print out the length of that string. This may help you see what characters are really there, not just what characters you can see.
    OMG, dear OP, don't do that! Use Perl instead, really.
    $ echo -n $'atcg\r\nhello\r\n' > ATCG_FILE # this is our test file $ perl -mcharnames -e 'my $s = join "", <>; printf "%s: %d\n", charna +mes::viacode(ord $1), pos($s) while $s =~ m/([^atcg])/ig' ATCG_FILE CARRIAGE RETURN: 5 LINE FEED: 6 LATIN SMALL LETTER H: 7 LATIN SMALL LETTER E: 8 LATIN SMALL LETTER L: 9 LATIN SMALL LETTER L: 10 LATIN SMALL LETTER O: 11 CARRIAGE RETURN: 12 LINE FEED: 13