in reply to Re^2: Seeking Perl docs about how UTF8 flag propagates
in thread Seeking Perl docs about how UTF8 flag propagates

Could you please provide an example where lc behaves different, depending on the flag?

As far as I know lc will simply preserve the flag of the input (I am not sure whether this holds on EBCDIC platforms).

The opposite function, uc, is known to set the flag for a (non-flagged) input of chr 0xFF or 'ÿ': Its uppercase equivalent 'Ÿ' is not present in ISO-8859-1, but taken from the Unicode block Latin Extended-A.

  • Comment on Re^3: Seeking Perl docs about how UTF8 flag propagates

Replies are listed 'Best First'.
Re^4: Seeking Perl docs about how UTF8 flag propagates
by hv (Prior) on May 17, 2023 at 00:22 UTC

    Not sure about lc(), but here's another case where the closely-related uc() behaves differently:

    $ascii = "\x{df}"; chop($utfer = "\x{100}"); $utf = $ascii . $utfer; print uc($_) for ($ascii, $utf);

    As a Unicode codepoint, "\x{df}" is interpreted as the lowercase German "es-zed" character (ß), which uppercases to "SS". As an ASCII codepoint it is seen as a non-word character, and does not change.

    This is a rare case where changing the case of a string also changes its length.

      As an ASCII codepoint

      Nitpick: it isn't ASCII. I suspect you meant either ISO-8859-1 or Latin-1 or non-Unicode instead of ASCII which has a highest codepoint of \x{7f}.


      🦛

        Oops... that was an unintended reply. Sorry for the noise

      Ah, interesting. I missed that because it behaves differently depending on the use of features:

      haj@vdesktop:~$ perl -M5.010 -C -e 'print uc chr 0xdf, "\n"' ß haj@vdesktop:~$ perl -M5.012 -C -e 'print uc chr 0xdf, "\n"' SS