No, "\x{009a}" (a Unicode character) does not map to cp1252.

You did not tell Perl a specific encoding to use for your source code. So Perl assumed that your source code was encoded in Latin-1. Your examples show that you treated your source code as encoded in Windows-1252. So it isn't particularly surprising that Perl and you disagree about some of the characters in your source code (hard-coded into string literals).

So, for example, byte \x9a looks like an accented character when interpreted as Windows-1252 (something that this website also does -- check the headers). It looks just like (is the same character as) the Unicode character "\x{0161}" (š).

But Perl assumes that byte \x9a is in Latin-1 and so treats it the same as the Unicode character "\x{009a}" (a control character, 'single character introducer', that shouldn't be visible if I tried to reproduce it here), which is a character not available in Windows-1252.

So Perl tells you that it can't convert that character to Windows-1252.

Now, it has become very common for things claiming to be Latin-1 to actually include bytes from Windows-1252 with the desire and expectation to have them interpreted as Windows-1252 not as Latin-1. So common that w3c even decided that web pages claiming to be Latin-1 should actually just be treated like they claimed that they were Windows-1252.

And it looks like that decision may have confused, for example, http://www.fileformat.info/info/unicode/char/009a/index.htm, which (for me, anyway) shows a nice hatted 's' despite claiming it is an "Other, Control" type of character (compare to http://www.fileformat.info/info/unicode/char/0161/index.htm).

[ Note that the w3c declaring "treat Latin-1 as Windows-1252" for web pages, does not change the definition of either of those character sets nor have any impact on how Encode converts between them nor on how Perl treats script source code (not downloaded from a web page). ]

- tye        


In reply to Re: Windows-1252 characters from \x{0080} thru \x{009f} (source-code encoding) by tye
in thread Windows-1252 characters from \x{0080} thru \x{009f} by Jim

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.