in reply to Re^7: Seeking Perl docs about how UTF8 flag propagates
in thread Seeking Perl docs about how UTF8 flag propagates

That text is somewhat misleading - and note that it does not mention the UTF8 flag, indeed it explicitly says you "shouldn't worry" about what the internal format is.

Perl does not distinguish between text strings and binary strings, but programmers may do so when deciding how to interpret the contents of a string. Thus they may interpret input received from an outside source as a sequence of utf8 octets, and decide that they need to decode it to get the desired sequence of characters. The string "\x{c3}\x{9f}" is a sequence of two characters; if the programmer interprets it as a sequence of utf8 octets, they might choose to decode it to get the string "\x{df}". Those are two different strings.

However the string "\x{c3}\x{9f}" has two different possible internal representations, one with and one without the UTF8 flag enabled. It is the same string - the same sequence of characters - regardless of the internal representation. The same is true of the two different possible internal representations of "\x{df}".

Any time the abstraction leaks out - any time you need to care about which internal representation is being used for the string - that's an example of the Unicode bug.

  • Comment on Re^8: Seeking Perl docs about how UTF8 flag propagates