in reply to Re: RegExp breaks in Perl 5.10
in thread RegExp breaks in Perl 5.10
I think the issue with the module's original code is that the one side of the match has been decoded from UTF-8 (the word list from the file) while the other is in Latin1 (the literal strings in the source). In your test case, both are in Latin1, so they match.
When adding (at the beginning of the loop)
$word = Encode::decode("iso-8859-1", $word); # force utf8 flag on print "$word:\n";
I can reproduce the problem, i.e. when forcing utf8, I get
constitución: contribución: destitución: devolución: disminución: constituciones: Step 1 case 4: constitu contribuciones: Step 1 case 4: contribu destituciones: Step 1 case 4: destitu devoluciones: Step 1 case 4: devolu disminuciones: Step 1 case 4: disminu foo:
while with your original test, the output is
constitución: Step 1 case 4: constitu contribución: Step 1 case 4: contribu destitución: Step 1 case 4: destitu devolución: Step 1 case 4: devolu disminución: Step 1 case 4: disminu constituciones: Step 1 case 4: constitu contribuciones: Step 1 case 4: contribu destituciones: Step 1 case 4: destitu devoluciones: Step 1 case 4: devolu disminuciones: Step 1 case 4: disminu foo:
|
|---|