Ah, so it is the other way round, and very odd indeed. Runinng your code I get the same result,
and I don't have UTF-8 in my LANG setting, either. It seems that HTML::TokeParser turns on the
UTF-8 flag on strings returned by the get_text() method:
#!/usr/bin/perl
use HTML::TokeParser;
#use Data::Dump::Streamer;
use strict;
use Devel::Peek;
local $/;
my $lines = <DATA>;
my $tok_par = HTML::TokeParser->new(\$lines);
my $tok_inf = $tok_par->get_token ;
my $tok_typ = shift @{$tok_inf};
my $title = $tok_par->get_text() || "<NO TITLE FOUND>";
Dump ($title);
__DATA__
<title>egrave: è : eacute: é : rsquo: ’ : lsquo: &
+lsquo;</title>
__END__
SV = PV(0x81b4290) at 0x81ed950
REFCNT = 1
FLAGS = (PADBUSY,PADMY,POK,pPOK,UTF8)
PV = 0x8207b90 "egrave: \303\250 : eacute: \303\251 : rsquo: \342\20
+0\231 : lsquo: \342\200\230"\0 [UTF8 "egrave: \x{e8} : eacute: \x{e9}
+ : rsquo: \x{2019} : lsquo: \x{2018}"]
CUR = 49
LEN = 52
- that's why you see the right output on your UTF-8 terminal at home, but garbled stuff on the servers terminal.
Hmm. I call that a bug :-)
--shmem
_($_=" "x(1<<5)."?\n".q·/)Oo. G°\ /
/\_¯/(q /
---------------------------- \__(m.====·.(_("always off the crowd"))."·
");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
|