in reply to How to Fix Character Encoding Damaged Text Using Perl?
Since you already know what sequence of encoding and decoding lead to the broken output, the easiest way with Encode::Repair is this:
use 5.010; use strict; use warnings; use Encode::Repair qw(repair_encoding); my $broken = '敒›剕䕇呎'; say repair_encoding($broken, [decode => 'utf-8', encode => 'UTF-16LE']); __END__ # output: Re: URGENT
But it also works with learn_recoding:
use 5.010;
use strict;
use warnings;
use Encode::Repair qw(repair_encoding learn_recoding);
binmode STDOUT, ':encoding(UTF-8)';
my $broken = '敒›剕䕇呎';
my $pattern = learn_recoding(
from => $broken,
to => 'Re: URGENT',
encodings => ['UTF-8', 'UTF-16LE', 'UTF-16BE'],
);
if ($pattern) {
say repair_encoding($broken, $pattern);
}
So, what did you try?
(Updated to use pre tags instead of code, because code tags badly break most non-ASCII-chars.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: How to Fix Character Encoding Damaged Text Using Perl?
by Jim (Curate) on Jun 15, 2013 at 18:37 UTC |