Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have files that I want to make some subtitutions to get rid of certain utf8 characters (like replacing "smart quotes" with plain quotes). So I tried:
#!/usr/bin/perl -w use strict; use utf8; open (FILE, $ARGV[0]) or die "Can't read $ARGV[0].\n"; my $fileln = <FILE>; $fileln =~ tr/\’/\'/;
This however does not make any changes. What am I doing wrong? If it matters, I am using perl 5.8.8.

Replies are listed 'Best First'.
Re: "Asciitizing" utf8
by ikegami (Patriarch) on Mar 26, 2010 at 06:42 UTC
    You're comparing the encoded form of the character with the character. Fix:
    #!/usr/bin/perl -w use strict; use utf8; open(my $fh, '<:encoding(UTF-8)', $ARGV[0]) or die "Can't read $ARGV[0]: $!\n"; my $fileln = <$fh>; $fileln =~ tr/’/'/; ...[ do something with $fileln ]...

    In case you don't realize it, you didn't save your changes anywhere or do anything with them.

Re: "Asciitizing" utf8
by CountZero (Bishop) on Mar 26, 2010 at 07:03 UTC
    Text::Unidecode might do (part of) what you want.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Re: "Asciitizing" utf8
by Anonymous Monk on Mar 26, 2010 at 07:08 UTC
    perl -Mopen=:std,:encoding(UTF-8) -pe " y/\x{2018}\x{2019}\x{201B}\x{ +201C}\x{201D}\x{201F}/\x27\x27\x27\x22\x22\x22/; " < input > o +utput
    or multiple files at once, creating a backup of reach
    perl -i.orig -Mopen=:std,:encoding(UTF-8) -pe " y/\x{2018}\x{2019}\x +{201B}\x{201C}\x{201D}\x{201F}/\x27\x27\x27\x22\x22\x22/; " in +1 in2 in3 in4 in5
    On a side note, the above one-liner crashes with  -MO=Deparse on win32