parse_html_fileSimilar to parse_file() but parses HTML (strict) documents; $htmlfile can be filename or URL.$doc = $parser->parse_html_file( $htmlfile, \%opts );An optional second argument can be used to pass some options to the HTML parser as a HASH reference. Possible options are: Possible options are: encoding and URI for libxml2 < 2.6.27, and for later versions of libxml2 additionally: recover, sup- press_errors, suppress_warnings, pedantic_parser, no_blanks, and no_network.
So you probably want something like
my $dom = $parser->parse_html_string($htmlfromlwp, { encoding => 'iso8 +859-1' } );
since setting the encoding after the fact ( = parsing) doesn't result in re-parsing.
--shmem
_($_=" "x(1<<5)."?\n".q·/)Oo. G°\ /
/\_¯/(q /
---------------------------- \__(m.====·.(_("always off the crowd"))."·
");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
In reply to Re: XML::LibXML encoding problem
by shmem
in thread XML::LibXML encoding problem
by user2000
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |