in reply to UTF-8 Decoding, Wide Characters, and XML::Twig
Which one you need depends on which language is being used in the data: what are the non-ascii characters? If they are just the basic Latin-1 set for "Western" languages (French, German, Spanish) then you probably want CP1252. Or it could just be that the non-ascii characters are those nefarious "smart quotes" and other bothersome punctuation marks being foisted on us all, in which case any CP12?? charset will do.
You can use binmode on your output file handle to impose the conversion from perl-internal utf8 to cp1252 (or whatever); this way, no information is lost, 8-bit characters remain 8-bit characters, and the fixed-width lines get the right byte count:
binmode OUTFH, ":encoding(cp1252)";
(Someday, the boss might get the idea that the downstream process that needs the fixed-width file as input ought to accommodate utf8 data, and at that point you'll need to take out the binmode line, or maybe just change ":encoding(cp1252)" with ":utf8".)
|
|---|