I would be very surprised if there where normal cases where the UTF8-Flag isn't preserved when passing around. So no need to document the obvious.

It sounds like you've confused "expected" with "obvious."

"Widget can contain any ASCII character. This includes the semicolon." That second sentence is obvious—it's an easily deducible consequence of the first—so it need not (and should not) be stated.

Conversely, you can expect things to happen a certain way, but software can sometimes defy your expectation—as hv's remark below about substr demonstrates. That doesn't mean the software is misbehaving; it just means your expectation didn't align with it. In contrast, if an ASCII-holding object fails to be able to contain a semicolon, that's beyond unexpected—that's a bug.

And frankly, I don't think there is a clear expectation about whether things like substr and split, which create brand new strings out of pieces of existing strings, should blindly copy the UTF8 flag of their input.

Ideally, "expected" behavior would still be documented. (That statement strays into tautologyland: documenting things is how users know to expect them.) If the behavior is expected but still intentionally undefined, that fact ought to be documented too, so that coders know not to rely on it. The case at hand is neither, which to me suggests a shortfall in the documentation. A sentence or two in perlunicode about what's not guaranteed regarding the UTF8 flag would solve this, and let coders ensure their code isn't making unwarranted assumptions.


In reply to Re^6: Seeking Perl docs about how UTF8 flag propagates by raygun
in thread Seeking Perl docs about how UTF8 flag propagates by raygun

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.