use HTML::TokeParser; use strict; local $/; my $lines = <DATA>; my $tok_par = HTML::TokeParser->new(\$lines); my $tok_inf = $tok_par->get_token ; my $tok_typ = shift @{$tok_inf}; print "Type: $tok_typ \n" ; my $title = $tok_par->get_text() || "<NO TITLE FOUND>"; print "Title: $title \n" ; __END__ <title>egrave: è : eacute: é : rsquo: ’ : lsquo: & +lsquo;</title>
I've now tested this at home, and with my web host. At home it works as it should:
Title: egrave: è : eacute: é : rsquo: ’ : lsquo: ‘
At the web host it produces the results previously described:
Title: egrave: è : eacute: é : rsquo: ’ : lsquo: ‘
In case it makes a difference, at home I have:
This is perl, v5.8.8 built for i586-linux-thread-multi
and the web host has:
This is perl, v5.8.5 built for i386-linux-thread-multi
Do you know if this behaviour is a difference between 5.8.5 and 5.8.8? Thank you for any further advice!
In reply to Re^2: HTML::TokeParser, get_text scrambling rsquo and lsquo
by tridral
in thread HTML::TokeParser, get_text scrambling rsquo and lsquo
by tridral
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |