Since you already know what sequence of encoding and decoding lead to the broken output, the easiest way with Encode::Repair is this:
use 5.010; use strict; use warnings; use Encode::Repair qw(repair_encoding); my $broken = '敒›剕䕇呎'; say repair_encoding($broken, [decode => 'utf-8', encode => 'UTF-16LE']); __END__ # output: Re: URGENT
But it also works with learn_recoding:
use 5.010;
use strict;
use warnings;
use Encode::Repair qw(repair_encoding learn_recoding);
binmode STDOUT, ':encoding(UTF-8)';
my $broken = '敒›剕䕇呎';
my $pattern = learn_recoding(
from => $broken,
to => 'Re: URGENT',
encodings => ['UTF-8', 'UTF-16LE', 'UTF-16BE'],
);
if ($pattern) {
say repair_encoding($broken, $pattern);
}
So, what did you try?
(Updated to use pre tags instead of code, because code tags badly break most non-ASCII-chars.
In reply to Re: How to Fix Character Encoding Damaged Text Using Perl?
by moritz
in thread How to Fix Character Encoding Damaged Text Using Perl?
by Jim
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |