in reply to quotes substitution

One more take on this just for completeness. Normalise the string by decoding twice and then do the replacement.
#!/usr/bin/perl use warnings; use strict; use HTML::Entities; my $line = qq{” 3”}; my $decode_1 = decode_entities $line; my $decode_2 = decode_entities $decode_1; print qq{$line\n}; print qq{$decode_1\n}; print qq{$decode_2\n}; $decode_2 =~ s/\x{201D}/'/g; #' print qq{$decode_2\n};
Wide character in print at C:\perm\dev\_new.pl line 12. Wide character in print at C:\perm\dev\_new.pl line 13. ” 3” ” 3” ” 3” ' 3'
The symbols here are the double quotes and the warnings are as expected.

I have a general rule of thumb: decode often (it doesn't hurt), encode ONCE (or your life will be a misery and you'll have strings of junk like the one you had). :-)

Replies are listed 'Best First'.
Re^2: quotes substitution
by afoken (Chancellor) on Jun 30, 2009 at 05:51 UTC
    I have a general rule of thumb: decode often (it doesn't hurt), encode ONCE

    BAD idea. Decode and encode MUST match. Decoding too often (even once more than needed) DOES hurt. Imagine a piece of HTML source where someone explains how to encode the ampersand in HTML:

        Just write &.

    Decode once (like a browser does):

        Just write &.

    This is what you see in the browser, it is the correct solution.

    Decode for the second time, because, well "it doesn't hurt", as you said:

        Just write &.

    This is just wrong. Decoding too often damages the content.

    This is not HTML specific, you will get the same problem when you use C-style backslash escapes, you get the same problem with URL encoding. And I'm very sure there are lots of other encodings that will damage the content when the decoding routine is applied more than once.

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)