You have a utf8-encoded string. You need to convert it to Perl's native Unicode string format (which also happens to be utf8-encoded internallly, but marked with a special flag such that multibyte sequences are treated as single characters).
You can do this like:
utf8::decode($string);
The utf8::decode function works in-place (like chomp), so you can just call it in a void context.
That said, you won't find a \x92 character on the page you linked to, because there is none. There's a \x{2019} character though.
The following takes the page content, and makes ASCII control characters and non-ASCII characters visible.
use 5.010001; use LWP::UserAgent; my $url = 'http://publib.boulder.ibm.com/infocenter/brjrules/v7r0m +3/basic/tocView.jsp?toc=/com.ibm.websphere.ilog.jrules.doc/toc.xml'; my $content = LWP::UserAgent->new->get($url)->content; utf8::decode($content); $content =~ s { ([\x00-\x08\x0B\x0C\x0E-\x1F\x80-\x{1FFFFF}]) } { sprintf('[U+%04X]', ord($1)) }gex; print $content;
In reply to Re: converting smart quotes
by tobyink
in thread converting smart quotes
by slugger415
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |