in reply to Re^3: Windows-1252 characters from \x{0080} thru \x{009f} (source-code encoding)
in thread Windows-1252 characters from \x{0080} thru \x{009f}
Since the strings passed to those were created by assigning each byte to a character, each byte is taken to be a Unicode code point. Not an iso-8859-1 character.
The act of interpreting a byte as a Unicode codepoint is exactly equivalent to decoding it as Latin-1. Which is why people say "Perl assumes ISO-8859-1", and that isn't wrong.
Because there is no default, it also means the default cannot be changed, to cp1252 or anything else.
Such a change is possible, though not as easy as it sounds. It would require Perl to keep track of what is a byte and what is a codepoint, which would be a major departure from the current model (but inevitable in the long run, IMHO).
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^5: Windows-1252 characters from \x{0080} thru \x{009f} (source-code encoding)
by ikegami (Patriarch) on Apr 19, 2012 at 16:56 UTC | |
by moritz (Cardinal) on Apr 19, 2012 at 19:35 UTC | |
by BrowserUk (Patriarch) on Apr 19, 2012 at 22:33 UTC |