Sixtease has asked for the wisdom of the Perl Monks concerning the following question:

Hello friends, decoding a string seems to break regex matching.
use strict; use warnings; use encoding 'utf8'; use Encode; my $raw = encode('cp1250', 'í'); my $letter = decode('cp1250', $raw); if ($letter eq 'í') { print "yes\n" } else { print "no\n" } if ($letter =~ /í/) { print "yes\n" } else { print "no\n" } __OUTPUT__ yes no
What does this mean? Is there a way to get it working?

Replies are listed 'Best First'.
Re: matching UTF8 chars with regex
by AltBlue (Chaplain) on May 19, 2007 at 02:37 UTC
    encoding won't automatically upgrade your strings unless you turn it's source filter feature on: use encoding 'utf8', Filter => 1; should do it.

    --
    altblue.

      Yes! Yes! Thanks a lot! It works now... and as I drown myself into the encoding documentation, I see I've been doing some nasty stuff all the time.
Re: matching UTF8 chars with regex
by Joost (Canon) on May 19, 2007 at 01:29 UTC
Re: matching UTF8 chars with regex
by Juerd (Abbot) on Jun 13, 2007 at 19:54 UTC

    "use encoding" is broken in several ways, and there will not be a fix soon. Stop using it if you can. And I'm quite sure that you can.

    If your source code is UTF-8 encoded, tell Perl by adding "use utf8;". If your input and output must be UTF-8 encoded, tell Perl by adding "binmode STDIN, ':encoding(UTF-8)'; binmode STDOUT, ':encoding(UTF-8)';".

    Just don't "use encoding", please.

    If you remove "use encoding 'utf8';" and add "use utf8;" in its place, your problem is gone.

    Juerd # { site => 'juerd.nl', do_not_use => 'spamtrap', perl6_server => 'feather' }