Hello dear Unicode newbie,

You made one big mistake. Just one, so it's easy to fix. You assumed that you are supposed to look at the SvUTF8 flag, but you're not. It's an internal value, and because it's Perl you're allowed to look at its state. But you really shouldn't, if you want to keep your sanity.

Don't use is_utf8, okay? If you really want to know about internal flags, please use Devel::Peek's Dump function instead. It will print some extra useful internal values too, such as the other flags in Perl like NOK and IOK. For that matter, pretend that the UTF8 flag's name is UOK.

Better yet, pretend that the UTF8 flag does not exist. Perl just picks an encoding for numeric and string values automatically, and only in edge cases (and if you're dealing with internals or XS) you need to know what is going on.

Read perlunitut and perlunifaq, and realise that you sometimes may need to use Unicode::Semantics (or utf8::upgrade) before text functions operate correctly.

I think it's best if I don't explain what goes on in your code, and if you ignore explanations by others. Trying to understand what's going on internally is a nice exercise for when you know how to write good Unicode capable code, but not before that.

Decode your input, and encode your output. Don't query or set the SvUTF8 flag. Thanks!

Best regards,

Juerd


In reply to Re: Problem with join'ing utf8 and non-utf8 strings (bug?) by Juerd
in thread Problem with join'ing utf8 and non-utf8 strings (bug?) by rsmah

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.