in reply to Re^2: Odd problems with UTF-8, regexps, and newer Perl versions
in thread Odd problems with UTF-8, regexps, and newer Perl versions
Yes, as almut pointed out, my source is in UTF-8, so I do need the pragma.
But the plot thickens:
Going back to my original code, switching the "use encoding" for "use utf8" did not fix things. The original regular expression was much more complex, and it still dies. I've verified that even a tiny bit more complex RE will still fail even using "use utf8". It did seem a little "magical" that simply removing what should have been a harmless pragma made things work...
The modified example follows; I ran on 5.12.1. What am I missing? Your sage help is much appreciated!
#!/usr/bin/perl use strict vars; use utf8; binmode STDOUT, ":utf8"; my $e = "Böck"; if (utf8::is_utf8($e)) { print "yep, is UTF8: $e\n"; } # this succeeds (failed before with use encoding 'utf8', unknown why) if ($e=~ m/.*?[x]$/) { print "matched simple\n"; } print "success with simple\n"; # these die if ($e=~ m/.*?\p{Space}$/) { print "matched medium\n"; } print "success with medium\n"; if ($e=~ m/.*?[xyz]$/) { print "matched medium\n"; } print "success with medium\n"; # the original, full expression. Naturally, this dies. if ($e =~ m/(.*?)[,\p{isSpace}]+((?:\p{isAlpha}[\p{isSpace}\.]{1,2})+) +\p{isSpace}*$/) { print "matched complex\n"; } print "success with complex\n";
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^4: Odd problems with UTF-8, regexps, and newer Perl versions
by almut (Canon) on Jun 05, 2010 at 03:02 UTC |