First, I'm impressed that you were able to convey the display contents of the MSDOS-Prompt window -- thanks for that.

(Update: After code tags were added to "tidy things up", it seems the nice DOS glyphs are gone. Too bad... maybe the janitors can restore the earlier form, which I thought was quite clear.) (thanks, Arunbear!)

Second, in order to display your text correctly in the MSDOS-Prompt window, the encoding you need to use is the one called cp437. Just convert your text to that encoding, and it should look just fine.

It seems like you have a good understanding of what it means to convert text data to different encodings for output, and your different renderings of "Québécois" make sense, given that they are being viewed with a cp437-based display tool.

For ISO-8859-1, CP1252 and Unicode, the numeric code for "é" is 0xE9. When expressed in UTF16-LE, that becomes the two-byte sequence "\xE9\x00" (the 16-bit value 0x00E9, low-byte first); when converted to UTF8, it becomes the two-byte sequence "\xC3\xA9" (perlunicode explains why this is so, in the section titled "Unicode Encodings", about halfway or so down).

Also, your conversions to unicode have caused the "byte-order mark" (BOM) to be included at the beginning of the string. The BOM is code-point OxFEFF; in UTF16LE, that's "\xFF\xFE", and in utf8, it's "\xEF\xBB\xBF".

You can look up those various byte values in the mapping table for cp437: http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/PC/CP437.TXT
and you'll understand why those encodings of the word look the way they do in the MSDOS-Prompt window. (Note: that window tends to display null bytes as spaces.)


In reply to Re: Reading text file with French characters by graff
in thread Reading text file with French characters by Azih

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.