encoding and module

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: encoding and module by Joost (Canon) on Jun 09, 2006 at 10:30 UTC
"\x{80}\x{92}" etc just indicate characters by hexadecimal number. Which characters are referenced by those numbers is depended on the encoding that you're using, and you usually can't tell just from the char number. The only thing we can say is that it's not a 7-bit encoding, since \x80 takes up 8 bits. See Encode, ord, chr, perldata, perlio, binmode, utf8 etc. "What should it profit a man, if he should win a flame war, yet lose his cool?"	[reply]
Re: encoding and module by badaiaqrandista (Pilgrim) on Jun 09, 2006 at 06:01 UTC
Check out http://perldoc.perl.org/functions/ord.html and http://perldoc.perl.org/functions/chr.html. -cheepy-	[reply]
Re: encoding and module by graff (Chancellor) on Jun 10, 2006 at 00:33 UTC
Single-byte character codes in the range \x80-\x9f are not used for printable characters in any of the ISO-8859 sets (Latin, Greek, Cyrillic, Arabic, Hebrew), and since unicode respects ISO-8859, this range is "unprintable" when converted directly to unicode. (update: by that I mean, if you just extended these to 16-bit values by adding a null high byte) This range tends to be used for miscellaneous printable stuff by the various Microsoft code pages. These codes tend to be used for the same set of miscellaneous characters in all the Microsoft CP125n code pages (n=0..8) -- things like special quote characters and symbols; but earlier MS-DOS code pages (CP8..) tend to use the range in different ways. So you need to know something about where the data are coming from in order to know what do with characters in this range. The standard installation for the Encode module will handle all the DOS/Windows code pages, so if the data are CP125* (which is likely), just pick any of those as the "legacy" encoding in order to convert correctly to unicode.	[reply]