$line is still encoded. A character won't match the UTF-8 encoding of that character unless it's an ASCII character.
You had the right idea with -C / use open / binmode. The catch is that you're not reading from STDIN, you're reading from ARGV, and those don't work well if at all with ARGV.
The solution: Don't use ARGV.
my %fixes = ( "\x{00a9}" => '\\textcopyright', "\x{2010}" => '-', "\x{fffd}" => '\\,', "\x{03b4}" => '$\\delta$', "\x{00c5}" => '\\AA{}', ); my ($re) = map qr/$_/, join '|', map quotemeta, keys(%fixes); @ARGV = '-' if !@ARGV; for my $ARGV (@ARGV} { my $fh; if ($ARGV eq '-') { open($fh, '<&:encoding(UTF-8), *STDIN) or die "Can't dup STDIN: $!\n"); } else { open($fh, '<:encoding(UTF-8), $ARGV) or die "Can't open \"$ARGV\": $!\n"); } for (;;) { last if eof($fh); defined( my $line = <$fh> ) or die("Can't read from \"$ARGV\": $!\n"); $line =~ s/($re)/$fixes{$1}/g; print $line; } }
Yeah, it sucks. Especially since ARGV normally does that error checking for you.
In reply to Re: Search & replace of UTF-8 characters ?
by ikegami
in thread Search & replace of UTF-8 characters ?
by levien
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |