cormanaz has asked for the wisdom of the Perl Monks concerning the following question:
Problem is that one (but not all) of the web sites returns content in UTF-8, and for this I get a warning "Parsing of undecoded UTF-8 will give garbage when decoding entities at foobar.pl line 15."my $html = $response->decoded_content(); my $tree = HTML::TreeBuilder->new; $tree->parse($html);
But wait, I need to have the HTML parsed into a tree before I can find the tag that gives me the charset!
I realize I could extract the raw html and use a regexp or something to hunt for the content type tag before parsing the tree, but I'm wondering if there's a more elegant solution to this.
TIA...Steve
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: HTTP::Response decoded_content catch 22
by ikegami (Patriarch) on Aug 09, 2008 at 00:41 UTC | |
|
Re: HTTP::Response decoded_content catch 22
by Your Mother (Archbishop) on Aug 08, 2008 at 23:00 UTC | |
by cormanaz (Deacon) on Aug 09, 2008 at 15:18 UTC | |
by Your Mother (Archbishop) on Aug 09, 2008 at 15:44 UTC | |
by ikegami (Patriarch) on Aug 09, 2008 at 15:58 UTC | |
by cormanaz (Deacon) on Aug 09, 2008 at 16:28 UTC |