| [reply] |
Thanks for the suggestion, however that topic handles a bit different kind of a problem.
I decided to use a snippet like
sub decoded_content {
my $self = shift;
my $c;
eval {$c = $self->SUPER::decoded_content(raise_error => 1)} && ret
+urn $c;
$c = $self->SUPER::decoded_content(charset => 'none');
$c = decode('utf8', $c, Encode::FB_PERLQQ());
$c =~ s/\\x\d{,2}//g;
$c;
}
to get what I want, now the question is how to tell LWP to return a response object using this sub? | [reply] [d/l] |
I've checked your link (http://acus.org/new_atlanticist/sarkozy-delays-university-reforms-feared-greek-style-riots) and it really contains malformed utf8 character with 0x96 code. This message can't be correctly decoded, that's why decoded_content fails. Also you explicitly requested to raise_error if it's not able to decode message. Try instead to get content using HTTP::Message::content and decode it using Encode::decode.
| [reply] |
I'm not sure what your code looks like, but LWP has a callback mechanism, that is usually used for monitoring progress. Possibly you can use it, to decode your content.
It is up to you to open a file and write the data, as it comes in; possibly you could filter it there.
#!/usr/bin/perl -w
use strict;
use LWP::UserAgent;
# don't buffer the prints to make the status update
$| = 1;
my $ua = LWP::UserAgent->new();
my $received_size = 0;
my $url = 'http://www.cpan.org/authors/id/J/JG/JGOFF/parrot-0_0_7.tgz'
+;
print "Fetching $url\n";
my $request_time = time;
my $last_update = 0;
my $response = $ua->get($url,
':content_cb' => \&callback,
':read_size_hint' => 8192,
);
print "\n";
sub callback {
my ($data, $response, $protocol) = @_;
my $total_size = $response->header('Content-Length') || 0;
$received_size += length $data;
############################################3
# Here you write the $data to a filehandle or whatever should happen
# with it here, like do your decoding.
###########################################3
my $time_now = time;
# this to make the status only update once per second.
return unless $time_now > $last_update or $received_size == $total_s
+ize;
$last_update = $time_now;
print "\rReceived $received_size bytes";
printf " (%i%%)", (100/$total_size)*$received_size if $total_size;
printf " %6.1f/bps", $received_size/(($time_now-$request_time)||1)
if $received_size;
}
| [reply] [d/l] |