Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

How does it relate to use utf8 and Unicode::Semantics::up? I have read the sparse feature in 5.11.5; can someone explain this in deeper detail?

Replies are listed 'Best First'.
Re: use feature 'unicode_strings'
by ikegami (Patriarch) on Mar 05, 2010 at 16:03 UTC

    Some Perl operators are currently buggy. If a string consists of "à",

    • Sometimes uc will uppercase it, sometimes it won't.
    • Sometimes /\w/ will match, sometimes it won't.
    • etc

    But for "a" and for "ā",

    • uc will uppercase it.
    • /\w/ will match.
    • etc

    What's special about "à"? It's in the iso-8859-1 character set and outside of the ASCII character set. That's obviously not a reason for the current misbehaviour.

    This can't be fixed for backwards compatibility reasons, so a pragma was added.

    Unicode::Semantics::up (aka utf8::upgrade) is a hack that addresses the same issue. However, it only affects one string, the effect is fleeting, and it forces the use of a less efficient storage format.

    The pragma fixes all broken operators, without the side effects. And it's fixed mostly automatically; all you need is use 5.012;.

      That was a pretty good explanation. What happens if I combine use utf8; and use 5.012;?
        You'll tell Perl your source in UTF-8 encoded, you want a version check for 5.12, and you want 5.12's backward-incompatible changes.
Re: use feature 'unicode_strings'
by Anonymous Monk on Mar 05, 2010 at 12:46 UTC
      My question was specifically about the module in the topic/title.
        How does the new solution to the old problem relate to the old solution? Please explain in deeper detail that which is already explained twice in very deep detail.