Hmm, I was sufficiently surprised by this behaviour (that I've not heard of before) that I went looking. First off, your code fragment is not much use, as it does not define what $R2 contains. So I went and looked at the source, and ripped the following out of its guts:
use strict; use warnings; my @word = qw( constituci\xf3n contribuci\xf3n destituci\xf3n devoluci\xf3n dismi +nuci\xf3n constituciones contribuciones destituciones devoluciones disminuci +ones foo ); my $vowels = 'aeiou\xe1\xe9\xed\xf3\xfa\xfc'; my $consonants = 'bcdfghjklmn\xf1pqrstvwxyz'; my $revowel = qr/[$vowels]/; my $reconsonants = qr/[$consonants]/; my $R2; my $suffix; for my $word (@word) { ($R2) = $word =~ /^.*?$revowel$reconsonants.*?$revowel$reconsonant +s(.*)$/; $R2 ||= ''; if ( ($suffix) = $R2 =~ /(uciones|uci\xf3n)$/ ) { # uci\xf3n uciones # replace with u if in R2 $word =~ s/$suffix$/u/; print "Step 1 case 4: $word\n"; } }
(Those \xnn characters really are Latin-1 characters, that's just a direct cut'n'paste from my shell introducing the artifact).
And that runs just fine here, all the way up to "perl, v5.11.0 DEVEL33323 built for i386-freebsd-64int". So there's something else going on. Both "ución" and "uciones" match just fine. Perhaps the tester platforms are running in a different locale. To play it safe, I suggest you encode your program in UTF-8 and slap a use utf8 at the top and be done with it. At least I think that's the correct best practice. Thinking about encoding makes my head explode.
• another intruder with the mooring in the heart of the Perl
In reply to Re: RegExp breaks in Perl 5.10
by grinder
in thread RegExp breaks in Perl 5.10
by jfraire
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |