The first reply gives the correct answer about how to "change the codepage when perl saves the file", but the experiment proposed by the Anonymous Monk above is important.
Problems with character encodings for non-ASCII text are difficult to describe clearly. If you know what specific encoding(s) you are dealing with (or would like to use), you need to be clear about that -- name them. If you can't figure out the explicit name(s), then it is certainly helpful to say what language you are dealing with (thanks for doing that).
When characters are simply not coherent, it helps to have an explicit numeric rendering of the byte sequence (e.g. using hex digits for each byte value). The pack function is handy for this, though it can be a little tough to figure out sometimes... Another way is:
my $hexstring = join " ", map { sprintf("0x%04x",ord()) } split //, $f
+oobar_string;
(Update: changed incorrect use of "chr()" to correct use of "ord()".)
That will print every "character" of a string as a four-digit hex number (with leading zeros, e.g. 0x0041 for "A"). If the original string is being treated by perl as single-byte "characters", the hex numbers will all start with "0x00"; if the string is flagged as containing utf8 characters, the hex numbers will be the 16-bit values representing unicode "codepoints". Using a technique like this to know what is in your data (whether file contents or file names) is usually very helpful. |