changing codepage while saving to a text file

atnonis has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: changing codepage while saving to a text file by Happy-the-monk (Canon) on Mar 09, 2006 at 10:58 UTC
You could use Encode and its friends to change the encoding of your data before writing. Or you could use the functionality of open that allows you to specify the encoding of your data, as given in the example code `open(FH, "<:utf8", "file")` (which is for reading - writig will work analogously). I havent used that yet, but it sounds like the right solution. I don't know about codepages though, I'd suggest you read the Encode documentation on it. Cheers, Sören	[reply] [d/l]
Re: changing codepage while saving to a text file by insaniac (Friar) on Mar 09, 2006 at 12:29 UTC
or maybe something like: `use utf8; # $string being your string with wicked chars utf8::decode($string);` [download] hope this helps... to ask a question is a moment of shame to remain ignorant is a lifelong shame	[reply] [d/l]
Re: changing codepage while saving to a text file by Anonymous Monk on Mar 09, 2006 at 16:17 UTC
Please name a file `Ελληνικά` and run the program below. Tell us what the result is, if it doesn't work like you intended. `use strict; use diagnostics; use Encode 'from_to'; my @filenames = map { from_to( $_, 'cp1253' => 'utf8', ); $_; } glob '*'; { open my $filelisting, '>', 'filelisting' or die "could not open <f +ilelisting> for writing: $!"; binmode $filelisting, ':utf8'; print $filelisting join "\n", @filenames; close $filelisting; };` [download]	[reply] [d/l]
Re: changing codepage while saving to a text file by graff (Chancellor) on Mar 09, 2006 at 23:11 UTC
The first reply gives the correct answer about how to "change the codepage when perl saves the file", but the experiment proposed by the Anonymous Monk above is important. Problems with character encodings for non-ASCII text are difficult to describe clearly. If you know what specific encoding(s) you are dealing with (or would like to use), you need to be clear about that -- name them. If you can't figure out the explicit name(s), then it is certainly helpful to say what language you are dealing with (thanks for doing that). When characters are simply not coherent, it helps to have an explicit numeric rendering of the byte sequence (e.g. using hex digits for each byte value). The pack function is handy for this, though it can be a little tough to figure out sometimes... Another way is: `my $hexstring = join " ", map { sprintf("0x%04x",ord()) } split //, $f +oobar_string;` [download] (Update: changed incorrect use of "chr()" to correct use of "ord()".) That will print every "character" of a string as a four-digit hex number (with leading zeros, e.g. 0x0041 for "A"). If the original string is being treated by perl as single-byte "characters", the hex numbers will all start with "0x00"; if the string is flagged as containing utf8 characters, the hex numbers will be the 16-bit values representing unicode "codepoints". Using a technique like this to know what is in your data (whether file contents or file names) is usually very helpful.	[reply] [d/l]