Re^5: german Alphabet

It might be important to note that when one tries to print a wide string that happens to be representable in latin-1, Perl uses latin-1 with no warnings:

$ perl -w -Mutf8 -E'print "ê"' | hd
00000000  ea                                                |.|
00000001
[download]

"ê" is decoded into characters but then printed to a handle that doesn't have an :encode(...) or :utf8 IOLayer. Since it's representable in latin-1, the single-byte encoding is used and no warning is shown.

$ perl -w -Mutf8 -E'print "ы"' | hd
Wide character in print at -e line 1.
00000000  d1 8b                                             |..|
00000002

Similar situation, but "ы" cannot be represented in latin-1, so we get a warning and UTF-8 bytes instead.

$ perl -w -E'print "ê"' | hd
00000000  c3 aa                                             |..|
00000002
[download]

(My terminal is UTF-8. No decoding or encoding is done in this case, Perl operates on bytes.)

Comment on Re^5: german Alphabet Select or Download Code

Replies are listed 'Best First'.
Re^6: german Alphabet by ikegami (Patriarch) on Dec 16, 2018 at 19:53 UTC
No. Perl never uses latin-1. In the first case (`print "\xEA";`), Perl is expecting bytes, and you provided a string of bytes, so it printed the bytes (as-is). It didn't warn because you provided what was expected. In the second case (`print "\x{44B}";`), Perl is expecting bytes, and you didn't provided a string of bytes, so it guesses that you meant to encode them using UTF-8, does so, and warns. In the third case (`print "\xC3\xAA";`), Perl is expecting bytes, and you provided a string of bytes, so it printed the bytes (as-is). It didn't warn because you provided what was expected. (A string a bytes is a string consisting of entirely characters with a value less than 256.)	[reply] [d/l] [select]
Re^7: german Alphabet by Anonymous Monk on Dec 16, 2018 at 21:47 UTC
I think I understand it now: decoding `"\xC3\xAA"` from UTF-8 creates a code-point with a value less than 256, `U+00EA`, and `"\xEA"` just happens to be latin-1 for the same code point because of the way Unicode has been designed, not a Perl quirk. Thank you for correcting me.	[reply] [d/l] [select]