It sounds like HTML::TreeBuilder isn't dealing with the utf-8 encoding right. However, I think there's an easy solution, hinted at by HTML::Treebuilder and open.
Ok, so it accepts file handles? Good...An important method inherited from HTML::Parser, which see. Current versions of HTML::Parser can take a filespec, or a filehandle object, like *FOO, or some object from class IO::Handle, IO::File, IO::Socket) or the like. I think you should check that a given file exists before calling $root->parse_file($filespec).$root = HTML::TreeBuilder->new() $root->parse_file(...)
open:
Ok, so we can specify which encoding to use when we open a file? Hmm!open(my $fh, "<:encoding(UTF-8)", "filename") || die "can't open UTF-8 encoded filename: $!";
So here's what I'd try. open my $fh, "<:encoding(UTF-8)", $yourOriginalFileName; my $tree = HTML::TreeBuilder::XPath->new; $tree->parse_file($fh); </c> Untested though, but hopefully it helps.
Edit:: made a weird mistake in my code (as well as in the links). Fixed. I hope.
In reply to Re: transforming html
by muba
in thread transforming html
by morgon
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |