in reply to RegExp breaks in Perl 5.10
Hmm, I was sufficiently surprised by this behaviour (that I've not heard of before) that I went looking. First off, your code fragment is not much use, as it does not define what $R2 contains. So I went and looked at the source, and ripped the following out of its guts:
use strict; use warnings; my @word = qw( constituci\xf3n contribuci\xf3n destituci\xf3n devoluci\xf3n dismi +nuci\xf3n constituciones contribuciones destituciones devoluciones disminuci +ones foo ); my $vowels = 'aeiou\xe1\xe9\xed\xf3\xfa\xfc'; my $consonants = 'bcdfghjklmn\xf1pqrstvwxyz'; my $revowel = qr/[$vowels]/; my $reconsonants = qr/[$consonants]/; my $R2; my $suffix; for my $word (@word) { ($R2) = $word =~ /^.*?$revowel$reconsonants.*?$revowel$reconsonant +s(.*)$/; $R2 ||= ''; if ( ($suffix) = $R2 =~ /(uciones|uci\xf3n)$/ ) { # uci\xf3n uciones # replace with u if in R2 $word =~ s/$suffix$/u/; print "Step 1 case 4: $word\n"; } }
(Those \xnn characters really are Latin-1 characters, that's just a direct cut'n'paste from my shell introducing the artifact).
And that runs just fine here, all the way up to "perl, v5.11.0 DEVEL33323 built for i386-freebsd-64int". So there's something else going on. Both "ución" and "uciones" match just fine. Perhaps the tester platforms are running in a different locale. To play it safe, I suggest you encode your program in UTF-8 and slap a use utf8 at the top and be done with it. At least I think that's the correct best practice. Thinking about encoding makes my head explode.
• another intruder with the mooring in the heart of the Perl
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: RegExp breaks in Perl 5.10
by almut (Canon) on Mar 06, 2008 at 21:13 UTC | |
|
Re^2: RegExp breaks in Perl 5.10
by eserte (Deacon) on Mar 06, 2008 at 20:52 UTC |