atnonis has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks,

I've create a perl script with runs under windows xp (dos/command promt) which "dirs" the contents of cdrom and then saves it to a text file. The problem I have is that the contents might contains greek character and when saving a file I see characters like Άε9.xls.

How i can change the codepage when perl saves the file?

Thanks,
Antonis
  • Comment on changing codepage while saving to a text file

Replies are listed 'Best First'.
Re: changing codepage while saving to a text file
by Happy-the-monk (Canon) on Mar 09, 2006 at 10:58 UTC

    You could use Encode and its friends to change the encoding of your data before writing.

    Or you could use the functionality of open that allows you to specify the encoding of your data, as given in the example code open(FH, "<:utf8", "file") (which is for reading - writig will work analogously). I havent used that yet, but it sounds like the right solution.

    I don't know about codepages though, I'd suggest you read the Encode documentation on it.

    Cheers, Sören

Re: changing codepage while saving to a text file
by insaniac (Friar) on Mar 09, 2006 at 12:29 UTC
    or maybe something like:
    use utf8; # $string being your string with wicked chars utf8::decode($string);

    hope this helps...

    to ask a question is a moment of shame
    to remain ignorant is a lifelong shame

Re: changing codepage while saving to a text file
by Anonymous Monk on Mar 09, 2006 at 16:17 UTC
    Please name a file Ελληνικά and run the program below. Tell us what the result is, if it doesn't work like you intended.
    use strict; use diagnostics; use Encode 'from_to'; my @filenames = map { from_to( $_, 'cp1253' => 'utf8', ); $_; } glob '*'; { open my $filelisting, '>', 'filelisting' or die "could not open <f +ilelisting> for writing: $!"; binmode $filelisting, ':utf8'; print $filelisting join "\n", @filenames; close $filelisting; };
Re: changing codepage while saving to a text file
by graff (Chancellor) on Mar 09, 2006 at 23:11 UTC
    The first reply gives the correct answer about how to "change the codepage when perl saves the file", but the experiment proposed by the Anonymous Monk above is important.

    Problems with character encodings for non-ASCII text are difficult to describe clearly. If you know what specific encoding(s) you are dealing with (or would like to use), you need to be clear about that -- name them. If you can't figure out the explicit name(s), then it is certainly helpful to say what language you are dealing with (thanks for doing that).

    When characters are simply not coherent, it helps to have an explicit numeric rendering of the byte sequence (e.g. using hex digits for each byte value). The pack function is handy for this, though it can be a little tough to figure out sometimes... Another way is:

    my $hexstring = join " ", map { sprintf("0x%04x",ord()) } split //, $f +oobar_string;
    (Update: changed incorrect use of "chr()" to correct use of "ord()".)

    That will print every "character" of a string as a four-digit hex number (with leading zeros, e.g. 0x0041 for "A"). If the original string is being treated by perl as single-byte "characters", the hex numbers will all start with "0x00"; if the string is flagged as containing utf8 characters, the hex numbers will be the 16-bit values representing unicode "codepoints". Using a technique like this to know what is in your data (whether file contents or file names) is usually very helpful.