in reply to character problem
Note that the parsing result will likely not be valid if raw undecoded UTF-8 is used as a source.Have you done this ?
When parsing UTF-8 encoded files turn on UTF-8 decoding:open(my $fh, "<:utf8", "index.html") || die "Can't open 'index.html': $!";
my $p = HTML::TokeParser->new( $fh );
"Battle not with trolls, lest ye become a troll; and if you gaze into the Internet, the Internet gazes also into you."
-Friedrich Nietzsche: A Dynamic Translation
|
|---|