in reply to Windows-1252 characters from \x{0080} thru \x{009f}
No, "\x{009a}" (a Unicode character) does not map to cp1252.
You did not tell Perl a specific encoding to use for your source code. So Perl assumed that your source code was encoded in Latin-1. Your examples show that you treated your source code as encoded in Windows-1252. So it isn't particularly surprising that Perl and you disagree about some of the characters in your source code (hard-coded into string literals).
So, for example, byte \x9a looks like an accented character when interpreted as Windows-1252 (something that this website also does -- check the headers). It looks just like (is the same character as) the Unicode character "\x{0161}" (š).
But Perl assumes that byte \x9a is in Latin-1 and so treats it the same as the Unicode character "\x{009a}" (a control character, 'single character introducer', that shouldn't be visible if I tried to reproduce it here), which is a character not available in Windows-1252.
So Perl tells you that it can't convert that character to Windows-1252.
Now, it has become very common for things claiming to be Latin-1 to actually include bytes from Windows-1252 with the desire and expectation to have them interpreted as Windows-1252 not as Latin-1. So common that w3c even decided that web pages claiming to be Latin-1 should actually just be treated like they claimed that they were Windows-1252.
And it looks like that decision may have confused, for example, http://www.fileformat.info/info/unicode/char/009a/index.htm, which (for me, anyway) shows a nice hatted 's' despite claiming it is an "Other, Control" type of character (compare to http://www.fileformat.info/info/unicode/char/0161/index.htm).
[ Note that the w3c declaring "treat Latin-1 as Windows-1252" for web pages, does not change the definition of either of those character sets nor have any impact on how Encode converts between them nor on how Perl treats script source code (not downloaded from a web page). ]
- tye
|
|---|