If you use $root->utf8_mode(0); (the default) and you pass decoded text to parse, you'll get decoded text from HTML::Element. When outputting the new HTML, encode it as normal.
use strict; use warnings; use HTML::TreeBuilder; # my $decoded_html = $http_response->decoded_content(); my $decoded_html = <<"__EOI__"; <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> <title>Foo</title> </head> <body>\xC9ric</body> </html> __EOI__ my $t = HTML::TreeBuilder->new(); $t->parse($decoded_html); $t->eof(); my $val = ( $t->content_list() )[1]->as_text(); binmode STDOUT, ":encoding(UTF-8)"; print(<<"__EOI__"); <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> <title>Foo</title> </head> <body>Extracted $val</body> </html> __EOI__
In reply to Re: Parsing UTF-8 HTML w/ HTML::Parser
by ikegami
in thread Parsing UTF-8 HTML w/ HTML::Parser
by Purdy
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |