Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Re^3: Reg Ex to strip MS smart quotes

by derby (Abbot)
on Aug 19, 2005 at 18:18 UTC ( [id://485253]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Reg Ex to strip MS smart quotes
in thread Reg Ex to strip MS smart quotes

Are you sure? What problems are you having? Here's the snippet from the code that translates smart-quotes:

$s =~ s/\x93/"/g; $s =~ s/\x94/"/g;

And here's how I've modified the core demoronise sub:

sub de_cp1252 { my( $self, $s ) = @_; # Map incompatible CP-1252 characters $s =~ s/\x82/,/g; $s =~ s-\x83-<em>f</em>-g; $s =~ s/\x84/,,/g; $s =~ s/\x85/.../g; $s =~ s/\x88/^/g; $s =~ s-\x89- °/°°-g; $s =~ s/\x8B/</g; $s =~ s/\x8C/Oe/g; $s =~ s/\x91/'/g; $s =~ s/\x92/'/g; $s =~ s/\x93/"/g; $s =~ s/\x94/"/g; $s =~ s/\x95/*/g; $s =~ s/\x96/-/g; $s =~ s/\x97/--/g; $s =~ s-\x98-<sup>~</sup>-g; $s =~ s-\x99-<sup>TM</sup>-g; $s =~ s/\x9B/>/g; $s =~ s/\x9C/oe/g; # Now check for any remaining untranslated characters. if ($s =~ m/[\x00-\x08\x10-\x1F\x80-\x9F]/) { for( my $i = 0; $i < length($s); $i++) { my $c = substr($s, $i, 1); if ($c =~ m/[\x00-\x09\x10-\x1F\x80-\x9F]/) { printf(STDERR "warning--untranslated character 0x%02X i +n input line %s\n", unpack('C', $c), $s ); } } } $s; }

I didn't really care about the other stuff (such as bad html or unicode) - just translating the known cp1252 misplaced characters into something reasonable.

-derby

Replies are listed 'Best First'.
Re^4: Reg Ex to strip MS smart quotes
by freddo411 (Chaplain) on Aug 19, 2005 at 20:35 UTC
    Bingo. That snippit is perfect.

    Interestingly, I found demoronizer and I kept looking because I thought it only worked on HTML and output HTML entities.

    Thanks again.

    -------------------------------------
    Nothing is too wonderful to be true
    -- Michael Faraday

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://485253]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (6)
As of 2024-04-23 13:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found