http://qs1969.pair.com?node_id=11138555

BernieC has asked for the wisdom of the Perl Monks concerning the following question:

This should be easy but I can't quite figure a way to do it. I have a text file which is nothing special, except that it uses close-single-quotes {as it should, actually :o)} as apostrophes. I've examined the file and what it has is the sequence {hex} e2 80 99 everyplace a ' should go. It should be easy but I can't sort out how to write a s/???/'/g that'd work.

Replies are listed 'Best First'.
Re: replacing close-single-quote with apostrophe
by choroba (Cardinal) on Nov 07, 2021 at 21:28 UTC
    You can use the \x notation in a regex:
    s/\xe2\x80\x99/'/g

    If you want to use it in a one-liner, you need to be careful about quoting. In bash, for example, you need to write

    perl -pe 's/\xe2\x80\x99/'\''/g' file

    or use the \x for the single quote, as well:

    perl -pe 's/\xe2\x80\x99/\x27/g' file

    Alternatively, if you want to use "smart quote" directly:

    use utf8;
    
    open my $in, '<:encoding(UTF-8)', 'file' or die $!;
    while (<$in>) {
        s/’/'/g;
        print;
    }
    

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]

      For completeness, you can also use the literal character in a one-liner as well:

      $ echo It’s easy | perl -Cio -pe "s/’/'/g;"
      It's easy
      

      Note that the script, such as it is, is enclosed in double-quotes here only to avoid the problem of escaping the single quote within.


      🦛

        That command is wrong. You want

        perl -CSD -Mutf8 -pe"s/’/'/g;"

        -Cio has no effect in that example, which you can see by removing it.

        $ echo It’s easy | perl -pe"s/’/'/g;" It's easy

        -Ci only has an effect if reading from a file

        $ perl -Cio -pe"s/’/'/g;" <( echo "It’s easy" ) Wide character in print at -e line 1, <> line 1. It’s easy

        This is what you want if reading from a file:

        $ perl -CiO -Mutf8 -pe"s/’/'/g;" <( echo "It’s easy" ) It's easy

        This is what you want if reading from STDIN:

        $ echo "It’s easy" | perl -CIO -Mutf8 -pe"s/’/'/g;" It's easy

        Combining both, you can use

        perl -CiIO -Mutf8 -pe"s/’/'/g;"

        Better:

        perl -CSD -Mutf8 -pe"s/’/'/g;"
      Perfect -- exactly what I was looking for.. THANKS
Re: replacing close-single-quote with apostrophe
by LanX (Saint) on Nov 08, 2021 at 01:17 UTC
    That's the utf8 encoding of the codepoint of Right Single Quotation Mark

    You should better rely on proper encoding of your script and the input layer and just write s/`/'/g like choroba showed you in the second part.

    Fiddling with bytes instead of characters isn't a clean approach.

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

Re: replacing close-single-quote with apostrophe
by BillKSmith (Monsignor) on Nov 08, 2021 at 15:34 UTC
    As others have already shown, it is best to separate the functions of decode and translation. The following code demonstrates a few minor improvements. The use of character names allows you to type your script in pure ascii. It makes your intention clear without concern about how very similar looking graphics are displayed (Probably the same reasons that you used informal character names in your post). I prefer the use of the tr/// operator rather than s///g for fixed substitutions such as this.
    use strict; use warnings; use Encode 'decode'; use Test::More tests=>1; my $in_file = qq(This isn\xe2\x80\x99t hard); my $text = decode('utf-8', $in_file); $text =~ tr/\N{RIGHT SINGLE QUOTATION MARK}/\N{APOSTROPHE}/; is( $text, q(This isn't hard), 'quote mark test' );

    OUTPUT:

    1..1 ok 1 - quote mark test
    Bill