in reply to Re: converting smart quotes
in thread converting smart quotes
Hi all, thank you so much for your comments and suggestions, and many apologies for my bad linking and explanations. Some responses:
First off, I'm not sure why (ww) you don't see the What's new string. Perhaps these screen grabs will help describe what I'm talking about, from the above URL (and I hope I'm not breaking a rule here):
2nd, I believe your example #2 is the smart quote I'm discussing, though it appears slightly differently in my text editor than it does in my browser. Here's a paste of the text here:
What’s new
As for my specific Perl code:
my $browser = LWP::UserAgent->new; my $response = $browser->get( "http://publib.boulder.ibm.com/infocente +r/brjrules/v7r0m3/basic/tocView.jsp?toc=/com.ibm.websphere.ilog.jrule +s.doc/toc.xml" ); my $content = $$response{_content}; ## yes inefficient coding, but it +works open(OUT, ">content.html"); print OUT $content; close(OUT);
Adding utf8::decode to that, as suggested:
utf8::decode($content); $content =~ s { ([\x00-\x08\x0B\x0C\x0E-\x1F\x80-\x{1FFFFF}]) } { sprintf('[U+%04X]', ord($1)) }gex;
produces this:
What[U+2019]s new
At least it's finding it! But I confess I don't follow the regex there (I'm still learning...), and is there some shortcut in the code I'm missing?
Sorry if I'm asking dumb questions here or just not getting it. And I would like to better understand that regex -- is there some place to learn more about that?
Thank you all once again.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: converting smart quotes
by tangent (Parson) on Mar 20, 2012 at 12:05 UTC | |
|
Re^3: converting smart quotes
by ww (Archbishop) on Mar 20, 2012 at 12:14 UTC | |
by tobyink (Canon) on Mar 20, 2012 at 13:14 UTC | |
by slugger415 (Monk) on Mar 20, 2012 at 14:49 UTC | |
by slugger415 (Monk) on Mar 20, 2012 at 14:30 UTC | |
by tobyink (Canon) on Mar 20, 2012 at 15:27 UTC |