There's indeed a problem.
#!/usr/bin/perl use strict; use warnings; use encoding 'UTF-8'; #use encoding 'utf8'; #use utf8; use Devel::Peek qw( Dump ); my $word = "État"; # UTF-8 encoding of "État" my $char = "É"; # UTF-8 encoding of "É" Dump($word); print("String Length: ", length($word), "\n"); print("\n"); Dump($char); print("String Length: ", length($char), "\n"); print("\n"); if ($word =~ /$char/) { print "Matches\n"; } else { print "Does not match\n"; } if ($word =~ /\Q$char/) { print "Matches\n"; } else { print "Does not match\n"; } if (substr($word, 0, 1) eq $char) { print "Equal\n"; } else { print "Not equal\n"; }
SV = PV(0x22608c) at 0x225f9c REFCNT = 1 FLAGS = (PADBUSY,PADMY,POK,pPOK,UTF8) <-- That's good PV = 0x1822634 "\303\211tat"\0 [UTF8 "\x{c9}tat"] <-- That's good CUR = 5 LEN = 8 String Length: 4 <-- That's good SV = PV(0x2260a4) at 0x225f3c REFCNT = 1 FLAGS = (PADBUSY,PADMY,POK,pPOK,UTF8) <-- That's good PV = 0x22f43c "\303\211"\0 [UTF8 "\x{c9}"] <-- That's good CUR = 2 LEN = 4 String Length: 1 <-- That's good Does not match <-- WTF? Does not match <-- WTF? Equal <-- That's good

Replacing use encoding 'UTF-8'; with use encoding 'utf8'; yields the same results.

Replacing use encoding 'UTF-8'; with use utf8; produces the same dumps, but the matches succeed.

My suggestion:


In reply to Re^3: utf8, locale and regexp by ikegami
in thread utf8, locale and regexp by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.