The fact of the matter is, if the pattern switched to the Unicode character scheme then the pattern couldn't possibly match a single character in a UTF-8 string.

Why not?

If by "a single character" you mean a codepoint encoded it multiple bytes, then yes, that's the default mode.

If you by "a single character" you mean a byte of a multi-byte UTF-8 sequence, then yes, even that's possible (with the \C escape. Yes, it's... weird, but it is implemented). (But nowhere in the unicode docs I can find an indication that this is meant).

I think it would be nice if the person who wrote the perlunicode docs had adhered to the basic tenants of unicode when describing perl's state of unicode awareness.

And where did he not? By the way, if you find places where the docs need improvement, don't whine about it, but submit patches.

Or, was the idea to dumb down the docs and present factually incorrect descriptions so that beginners who think that Unicode characters are the same as UTF-8 characters are not confused?

I don't think so. I also don't see how anything of the docs is factually incorrect.

Perl 6 - links to (nearly) everything that is Perl 6.

In reply to Re: perl unicode docs by moritz
in thread perl unicode docs by 7stud

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.