That text is somewhat misleading - and note that it does not mention the UTF8 flag, indeed it explicitly says you "shouldn't worry" about what the internal format is.

Perl does not distinguish between text strings and binary strings, but programmers may do so when deciding how to interpret the contents of a string. Thus they may interpret input received from an outside source as a sequence of utf8 octets, and decide that they need to decode it to get the desired sequence of characters. The string "\x{c3}\x{9f}" is a sequence of two characters; if the programmer interprets it as a sequence of utf8 octets, they might choose to decode it to get the string "\x{df}". Those are two different strings.

However the string "\x{c3}\x{9f}" has two different possible internal representations, one with and one without the UTF8 flag enabled. It is the same string - the same sequence of characters - regardless of the internal representation. The same is true of the two different possible internal representations of "\x{df}".

Any time the abstraction leaks out - any time you need to care about which internal representation is being used for the string - that's an example of the Unicode bug.


In reply to Re^8: Seeking Perl docs about how UTF8 flag propagates by hv
in thread Seeking Perl docs about how UTF8 flag propagates by raygun

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.