in reply to Re^3: Seeking Perl docs about how UTF8 flag propagates
in thread Seeking Perl docs about how UTF8 flag propagates

Not sure about lc(), but here's another case where the closely-related uc() behaves differently:

$ascii = "\x{df}"; chop($utfer = "\x{100}"); $utf = $ascii . $utfer; print uc($_) for ($ascii, $utf);

As a Unicode codepoint, "\x{df}" is interpreted as the lowercase German "es-zed" character (ß), which uppercases to "SS". As an ASCII codepoint it is seen as a non-word character, and does not change.

This is a rare case where changing the case of a string also changes its length.

Replies are listed 'Best First'.
Re^5: Seeking Perl docs about how UTF8 flag propagates
by hippo (Archbishop) on May 17, 2023 at 06:46 UTC
    As an ASCII codepoint

    Nitpick: it isn't ASCII. I suspect you meant either ISO-8859-1 or Latin-1 or non-Unicode instead of ASCII which has a highest codepoint of \x{7f}.


    🦛

      Oops... that was an unintended reply. Sorry for the noise
Re^5: Seeking Perl docs about how UTF8 flag propagates
by haj (Vicar) on May 17, 2023 at 06:34 UTC

    Ah, interesting. I missed that because it behaves differently depending on the use of features:

    haj@vdesktop:~$ perl -M5.010 -C -e 'print uc chr 0xdf, "\n"' ß haj@vdesktop:~$ perl -M5.012 -C -e 'print uc chr 0xdf, "\n"' SS