LWP doesn't decode anything unless you use ->decoded_content. If you use ->content, you get the raw bytes returned by the web server. By using '>:encoding(iso-8859-5)', you are re-encoding chars that have already been encoded using windows-1251. That makes no sense. You need to undo the first encoding before encoding again.
use Encode qw( decode from_to ); # Outputs windows-1251 text open my $fh, '>', $qfn; print $fh $response->content; # Outputs iso-8859-5 text open my $fh, '>', $qfn; $content = $response->content; from_to($content, 'windows-1251', 'iso-8859-5'); print $fh $content; # Outputs iso-8859-5 text open my $fh, '>:encoding(iso-8859-5)', $qfn; print $fh decode('windows-1251', $response->content); # Outputs iso-8859-5 text, assuming # the content encoding is detected. open my $fh, '>:encoding(iso-8859-5)', $qfn; print $fh $response->decoded_content;
So,
should be one ofmy $file = $response->content;
ormy $file = $response->decoded_content;
my $file = decode('windows-1251', $response->content);
(iso-8859-5 is the iso name for windows-1251)
No. They're quite different.
windows-1251
iso-8859-5
In reply to Re: Downloading webpages with non-ASCII characters
by ikegami
in thread Downloading webpages with non-ASCII characters
by CountZero
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |