Cody Pendant has asked for the wisdom of the Perl Monks concerning the following question:
at which point it dies, saying "Parsing of undecoded UTF-8 will give garbage when decoding entities at /Library/Perl/HTML/TreeBuilder.pm line 96.".my $tree = HTML::TreeBuilder->new_from_content($page); $tree->elementify();
Now my first problem is that line 96 is the rather unedifying "$new->parse($whunk);". So, after a certain amount of trial and error I track that down to one of the dependent modules, HTML::Parser, which tells me that "The solution is to use the Encode::encode_utf8() on the data before feeding it to the $p->parse()".
So I do this:
$page = Encode::encode_utf8($page); my $tree = HTML::TreeBuilder->new_from_content($page); $tree->elementify();
But it doesn't seem to make any difference. Same error.
So I guess I have three questions:
($_='kkvvttuu bbooppuuiiffss qqffssmm iibbddllffss')
=~y~b-v~a-z~s; print
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: HTML::Tree problems with UTF-8 Content.
by ikegami (Patriarch) on Jul 16, 2005 at 06:20 UTC | |
|
Re: HTML::Tree problems with UTF-8 Content.
by GrandFather (Saint) on Jul 16, 2005 at 06:32 UTC | |
|
Re: HTML::Tree problems with UTF-8 Content.
by graff (Chancellor) on Jul 16, 2005 at 14:42 UTC |