in reply to Re^2: HTML::TokeParser, get_text scrambling rsquo and lsquo
in thread HTML::TokeParser, get_text scrambling rsquo and lsquo

I don't see any method to get the "raw" text either.

In any case, if the output encoding doesn't matter, just open the output file in utf8 mode and set the correct encoding in the html file (not needed if the file is on a webserver that sends the correct content-type w/ charset header for the file):

open my $out,">:utf8",$filename or die $!; # print head start print $out q(<META HTTP-EQUIV="Content-Type" CONTENT="text/html; chars +et=UTF-8">); # print rest of head and document