Re^4: Seeking Perl docs about how UTF8 flag propagates

Not sure about lc(), but here's another case where the closely-related uc() behaves differently:

$ascii = "\x{df}";
chop($utfer = "\x{100}");
$utf = $ascii . $utfer;
print uc($_) for ($ascii, $utf);
[download]

As a Unicode codepoint, "\x{df}" is interpreted as the lowercase German "es-zed" character (ß), which uppercases to "SS". As an ASCII codepoint it is seen as a non-word character, and does not change.

This is a rare case where changing the case of a string also changes its length.

Comment on Re^4: Seeking Perl docs about how UTF8 flag propagates Select or Download Code

Replies are listed 'Best First'.
Re^5: Seeking Perl docs about how UTF8 flag propagates by hippo (Archbishop) on May 17, 2023 at 06:46 UTC
As an ASCII codepoint Nitpick: it isn't ASCII. I suspect you meant either ISO-8859-1 or Latin-1 or non-Unicode instead of ASCII which has a highest codepoint of `\x{7f}`. 🦛	[reply] [d/l]
Re^6: Seeking Perl docs about how UTF8 flag propagates by haj (Vicar) on May 17, 2023 at 06:51 UTC
Oops... that was an unintended reply. Sorry for the noise	[reply]
Re^5: Seeking Perl docs about how UTF8 flag propagates by haj (Vicar) on May 17, 2023 at 06:34 UTC
Ah, interesting. I missed that because it behaves differently depending on the use of `feature`s: `haj@vdesktop:~$ perl -M5.010 -C -e 'print uc chr 0xdf, "\n"' ß haj@vdesktop:~$ perl -M5.012 -C -e 'print uc chr 0xdf, "\n"' SS` [download]	[reply] [d/l]