in reply to Re^4: german Alphabet
in thread german Alphabet

It might be important to note that when one tries to print a wide string that happens to be representable in latin-1, Perl uses latin-1 with no warnings:
$ perl -w -Mutf8 -E'print "ê"' | hd 00000000 ea |.| 00000001
"ê" is decoded into characters but then printed to a handle that doesn't have an :encode(...) or :utf8 IOLayer. Since it's representable in latin-1, the single-byte encoding is used and no warning is shown.
$ perl -w -Mutf8 -E'print "ы"' | hd
Wide character in print at -e line 1.
00000000  d1 8b                                             |..|
00000002
Similar situation, but "ы" cannot be represented in latin-1, so we get a warning and UTF-8 bytes instead.
$ perl -w -E'print "ê"' | hd 00000000 c3 aa |..| 00000002
(My terminal is UTF-8. No decoding or encoding is done in this case, Perl operates on bytes.)

Replies are listed 'Best First'.
Re^6: german Alphabet
by ikegami (Patriarch) on Dec 16, 2018 at 19:53 UTC

    No. Perl never uses latin-1.

    In the first case (print "\xEA";), Perl is expecting bytes, and you provided a string of bytes, so it printed the bytes (as-is). It didn't warn because you provided what was expected.

    In the second case (print "\x{44B}";), Perl is expecting bytes, and you didn't provided a string of bytes, so it guesses that you meant to encode them using UTF-8, does so, and warns.

    In the third case (print "\xC3\xAA";), Perl is expecting bytes, and you provided a string of bytes, so it printed the bytes (as-is). It didn't warn because you provided what was expected.

    (A string a bytes is a string consisting of entirely characters with a value less than 256.)

      I think I understand it now: decoding "\xC3\xAA" from UTF-8 creates a code-point with a value less than 256, U+00EA, and "\xEA" just happens to be latin-1 for the same code point because of the way Unicode has been designed, not a Perl quirk.

      Thank you for correcting me.