Re^3: Seeking Perl docs about how UTF8 flag propagates

Could you please provide an example where lc behaves different, depending on the flag?

As far as I know lc will simply preserve the flag of the input (I am not sure whether this holds on EBCDIC platforms).

The opposite function, uc, is known to set the flag for a (non-flagged) input of chr 0xFF or 'ÿ': Its uppercase equivalent 'Ÿ' is not present in ISO-8859-1, but taken from the Unicode block Latin Extended-A.

Comment on Re^3: Seeking Perl docs about how UTF8 flag propagates

Replies are listed 'Best First'.
Re^4: Seeking Perl docs about how UTF8 flag propagates by hv (Prior) on May 17, 2023 at 00:22 UTC
Not sure about `lc()`, but here's another case where the closely-related `uc()` behaves differently: `$ascii = "\x{df}"; chop($utfer = "\x{100}"); $utf = $ascii . $utfer; print uc($_) for ($ascii, $utf);` [download] As a Unicode codepoint, "\x{df}" is interpreted as the lowercase German "es-zed" character (ß), which uppercases to "SS". As an ASCII codepoint it is seen as a non-word character, and does not change. This is a rare case where changing the case of a string also changes its length.	[reply] [d/l] [select]
Re^5: Seeking Perl docs about how UTF8 flag propagates by hippo (Archbishop) on May 17, 2023 at 06:46 UTC
As an ASCII codepoint Nitpick: it isn't ASCII. I suspect you meant either ISO-8859-1 or Latin-1 or non-Unicode instead of ASCII which has a highest codepoint of `\x{7f}`. 🦛	[reply] [d/l]
Re^6: Seeking Perl docs about how UTF8 flag propagates by haj (Vicar) on May 17, 2023 at 06:51 UTC
Oops... that was an unintended reply. Sorry for the noise	[reply]
Re^5: Seeking Perl docs about how UTF8 flag propagates by haj (Vicar) on May 17, 2023 at 06:34 UTC
Ah, interesting. I missed that because it behaves differently depending on the use of `feature`s: `haj@vdesktop:~$ perl -M5.010 -C -e 'print uc chr 0xdf, "\n"' ß haj@vdesktop:~$ perl -M5.012 -C -e 'print uc chr 0xdf, "\n"' SS` [download]	[reply] [d/l]