(Update: After code tags were added to "tidy things up", it seems the nice DOS glyphs are gone. Too bad... maybe the janitors can restore the earlier form, which I thought was quite clear.) (thanks, Arunbear!)
Second, in order to display your text correctly in the MSDOS-Prompt window, the encoding you need to use is the one called cp437. Just convert your text to that encoding, and it should look just fine.
It seems like you have a good understanding of what it means to convert text data to different encodings for output, and your different renderings of "Québécois" make sense, given that they are being viewed with a cp437-based display tool.
For ISO-8859-1, CP1252 and Unicode, the numeric code for "é" is 0xE9. When expressed in UTF16-LE, that becomes the two-byte sequence "\xE9\x00" (the 16-bit value 0x00E9, low-byte first); when converted to UTF8, it becomes the two-byte sequence "\xC3\xA9" (perlunicode explains why this is so, in the section titled "Unicode Encodings", about halfway or so down).
Also, your conversions to unicode have caused the "byte-order mark" (BOM) to be included at the beginning of the string. The BOM is code-point OxFEFF; in UTF16LE, that's "\xFF\xFE", and in utf8, it's "\xEF\xBB\xBF".
You can look up those various byte values in the mapping table for cp437: http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/PC/CP437.TXT
and you'll understand why those encodings of the word look the way they do in the MSDOS-Prompt window. (Note: that window tends to display null bytes as spaces.)
In reply to Re: Reading text file with French characters
by graff
in thread Reading text file with French characters
by Azih
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |