in reply to converting smart quotes
You have a utf8-encoded string. You need to convert it to Perl's native Unicode string format (which also happens to be utf8-encoded internallly, but marked with a special flag such that multibyte sequences are treated as single characters).
You can do this like:
utf8::decode($string);
The utf8::decode function works in-place (like chomp), so you can just call it in a void context.
That said, you won't find a \x92 character on the page you linked to, because there is none. There's a \x{2019} character though.
The following takes the page content, and makes ASCII control characters and non-ASCII characters visible.
use 5.010001; use LWP::UserAgent; my $url = 'http://publib.boulder.ibm.com/infocenter/brjrules/v7r0m +3/basic/tocView.jsp?toc=/com.ibm.websphere.ilog.jrules.doc/toc.xml'; my $content = LWP::UserAgent->new->get($url)->content; utf8::decode($content); $content =~ s { ([\x00-\x08\x0B\x0C\x0E-\x1F\x80-\x{1FFFFF}]) } { sprintf('[U+%04X]', ord($1)) }gex; print $content;
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: converting smart quotes
by ikegami (Patriarch) on Mar 19, 2012 at 23:05 UTC | |
by tobyink (Canon) on Mar 20, 2012 at 00:51 UTC | |
by ikegami (Patriarch) on Mar 20, 2012 at 03:08 UTC |