in reply to utf8 encoding bug?
This is a very strange bug. It appears to be happening because the replacement is coming from an array; witness the following code:
#/usr/bin/perl -w require 5.8.0; use strict; my($a1, $d1) = ("\x{00E0}", "\x{00E4}"); my($a2, $d2) = ("\x{0430}", "\x{0434}"); my($a3, $d3) = (["\x{0430}"], ["\x{0434}"]); my @a4 = "\x{0430}"; my @d4 = "\x{0434}"; for (\&t2, \&t3, \&t4, \&t5) { my $text = $d1.$a1; warn "Before = ", join('.', unpack ("U*", ${text})), "\n\n"; &$_($text); warn "After = ", join('.', unpack ("U*", ${text})), "\n\n"; } sub t2 { $_[0] =~ s/$d1/$d2/g; $_[0] =~ s/$a1/$a2/g; } sub t3 { $_[0] =~ s/$d1/$d3->[0]/g; $_[0] =~ s/$a1/$a3->[0]/g; } sub t4 { $_[0] =~ s/$d1/$d4[0]/g; $_[0] =~ s/$a1/$a4[0]/g; } sub t5 { my $a5 = $a4[0]; my $d5 = $d4[0]; $_[0] =~ s/$d1/$d5/g; $_[0] =~ s/$a1/$a5/g; }
The t3() and t4() calls fail for me under perl-5.8.0 and with recent development sources at patchlevel 18736. The very latest development sources (@18777) succeed for all four cases, so this has clearly been fixed by a very recent patch.
The success of t5() in the above code suggests a workaround - grab the replacement variable into a scalar variable, and use that scalar for the replacement.
Hugo
|
|---|