in reply to character problem

From the docs:
Note that the parsing result will likely not be valid if raw undecoded UTF-8 is used as a source.
When parsing UTF-8 encoded files turn on UTF-8 decoding:

open(my $fh, "<:utf8", "index.html") || die "Can't open 'index.html': $!";
my $p = HTML::TokeParser->new( $fh );

Have you done this ?

            "Battle not with trolls, lest ye become a troll; and if you gaze into the Internet, the Internet gazes also into you."
        -Friedrich Nietzsche: A Dynamic Translation