Re^2: HTTP::Response decoded

Replies are listed 'Best First'.
Re^3: HTTP::Response decoded_content catch 22 by Your Mother (Archbishop) on Aug 09, 2008 at 15:44 UTC
`my $ua = LWP::UserAgent->new(); my $response = $ua->get("http://www.barackobama.com/2008/08/07/obama_t +alks_about_reviving_eco.php"); # print $response->decoded_content; + my $html = $response->decoded_content(); my $tree = HTML::TreeBuilder->new; $tree->parse($html); print $tree->as_HTML;` [download] That gives no errors for me either. And gets and parses the contents fine. Both on LWP 5.805 with perl 5.10 and LWP 5.814 on Perl 5.8.8. Time to upgrade?	[reply] [d/l]
Re^4: HTTP::Response decoded_content catch 22 by ikegami (Patriarch) on Aug 09, 2008 at 15:58 UTC
It still exists in the latest version. The docs say (W) The first chunk parsed appears to contain undecoded UTF-8 and one or more argspecs that decode entities are used for the callback handlers. The result of decoding will be a mix of encoded and decoded characters for any entities that expand to characters with code above 127. This is not a good thing. The solution is to use the Encode::encode_utf8() on the data before feeding it to the $p->parse(). For $p->parse_file() pass a file that has been opened in ":utf8" mode. The parser can process raw undecoded UTF-8 sanely if the C<utf8_mode> is enabled or if the "attr", "@attr" or "dtext" argspecs is avoided. [download] It could be that that server didn't specify the character encoding of the content.	[reply] [d/l]
Re^4: HTTP::Response decoded_content catch 22 by cormanaz (Deacon) on Aug 09, 2008 at 16:28 UTC
Well I upgraded to 5.814 (from 5.805) and that does seem to have fixed the problem. Thanks.	[reply]