in reply to Re^3: Default encoding rules leave me puzzled...
in thread Default encoding rules leave me puzzled...
Code points is an abstraction, it's an internal Perl thing.
What are you talking about? It has nothing to do with Perl. "e" is formed from the code point U+0065, "é" is formed from code point U+00E9 or from code points U+0065 + U+0301, etc. This is defined by The Unicode Consortium, not by Perl.
It must produce a bunch of bytes.
No, the input must be a string of integers in 0..255, which it is. print has no problem storing those as bytes. iso-latin-1 doesn't factor into it.
In which of the following is does print use iso-latin-1?
use utf8; my $s1 = inet_aton('195.169.195.171'); print($s1); my $s2 = encode_utf8("éë"); print($s2); my $s3 = "éë"; print($s3); my $s4 = "\xC3\xA9\xC3\xAB"; print($s4);
The only two possible answers are "all of them" or "none of them", since print can't tell the difference between those strings.
If you claim that iso-latin-1 is used, then you claim that use utf8; produces iso-latin-1. It doesn't. It produces Unicode code points.
That prints garbage instead of 'ç'.
Because the terminal expects bytes of UTF-8, but it got bytes of Unicode code points.
|
|---|