in reply to Re^4: Seeking Perl docs about how UTF8 flag propagates
in thread Seeking Perl docs about how UTF8 flag propagates
I'd need to (laboriously) check the source for chapter and verse, but as far as I remember in all the obvious cases when any of the inputs have UTF8 on, the output will too.
Here's an example commonly used in perl's tests to create a UTF8-flagged string by appending a flagged zero-length string:
% perl -MDevel::Peek -wle ' $x="\x{100}"; Dump($x); chop $x; Dump($x); $y = "foo"; Dump($y); $y .= $x; Dump($y) ' 2>&1 | grep FLAGS FLAGS = (POK,IsCOW,pPOK,UTF8) FLAGS = (POK,pPOK,UTF8) FLAGS = (POK,IsCOW,pPOK) FLAGS = (POK,pPOK,UTF8) %
Your examples certainly all appear to propagate the flag. However substr() appears to propagate it only if the resulting substring source string has characters above 0x7f: I have no idea why that appears to be an exception. And I also do not know of any guarantee that any of these behaviours will be retained in future perl versions (though I think it is hugely unlikely that the steps involved in the code above will change, due to its widespread use in core).
Update: from perl source, it appears substr returns a UTF8_off result if the source string has byte length and character length the same.
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^6: Seeking Perl docs about how UTF8 flag propagates
by raygun (Scribe) on May 16, 2023 at 05:56 UTC | |
Re^6: Seeking Perl docs about how UTF8 flag propagates
by choroba (Cardinal) on May 17, 2023 at 08:07 UTC | |
by hv (Prior) on May 17, 2023 at 12:13 UTC | |
by LanX (Saint) on May 17, 2023 at 11:21 UTC |