in reply to Re: LWP charset problem
in thread LWP charset problem

Thanks for the suggestion, however that topic handles a bit different kind of a problem.

I decided to use a snippet like
sub decoded_content { my $self = shift; my $c; eval {$c = $self->SUPER::decoded_content(raise_error => 1)} && ret +urn $c; $c = $self->SUPER::decoded_content(charset => 'none'); $c = decode('utf8', $c, Encode::FB_PERLQQ()); $c =~ s/\\x\d{,2}//g; $c; }
to get what I want, now the question is how to tell LWP to return a response object using this sub?

Replies are listed 'Best First'.
Re^3: LWP charset problem
by zwon (Abbot) on Jan 05, 2009 at 18:58 UTC

    I've checked your link (http://acus.org/new_atlanticist/sarkozy-delays-university-reforms-feared-greek-style-riots) and it really contains malformed utf8 character with 0x96 code. This message can't be correctly decoded, that's why decoded_content fails. Also you explicitly requested to raise_error if it's not able to decode message. Try instead to get content using HTTP::Message::content and decode it using Encode::decode.

Re^3: LWP charset problem
by zentara (Cardinal) on Jan 05, 2009 at 18:20 UTC
    I'm not sure what your code looks like, but LWP has a callback mechanism, that is usually used for monitoring progress. Possibly you can use it, to decode your content. It is up to you to open a file and write the data, as it comes in; possibly you could filter it there.
    #!/usr/bin/perl -w use strict; use LWP::UserAgent; # don't buffer the prints to make the status update $| = 1; my $ua = LWP::UserAgent->new(); my $received_size = 0; my $url = 'http://www.cpan.org/authors/id/J/JG/JGOFF/parrot-0_0_7.tgz' +; print "Fetching $url\n"; my $request_time = time; my $last_update = 0; my $response = $ua->get($url, ':content_cb' => \&callback, ':read_size_hint' => 8192, ); print "\n"; sub callback { my ($data, $response, $protocol) = @_; my $total_size = $response->header('Content-Length') || 0; $received_size += length $data; ############################################3 # Here you write the $data to a filehandle or whatever should happen # with it here, like do your decoding. ###########################################3 my $time_now = time; # this to make the status only update once per second. return unless $time_now > $last_update or $received_size == $total_s +ize; $last_update = $time_now; print "\rReceived $received_size bytes"; printf " (%i%%)", (100/$total_size)*$received_size if $total_size; printf " %6.1f/bps", $received_size/(($time_now-$request_time)||1) if $received_size; }

    I'm not really a human, but I play one on earth Remember How Lucky You Are