Using the formula my $re = qr/^([\/\w]+)/; as the pattern has the same problems.

For clarity, the test script which I provided works just as well with this regex. The point is that it demonstrates that there is nothing wrong with your perl code which does the regex matching and therefore the only logical conclusion is that your data is not what you think it is.

Are you decoding your UTF-8 data when you read it from the data files in your script? If not, that is the problem.

If you can provide a real SSCCE then I'm sure all will become clear.


🦛


In reply to Re^7: UTF8 versus \w in pattern matching by hippo
in thread UTF8 versus \w in pattern matching by mldvx4

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.