"Asciitizing" utf8

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have files that I want to make some subtitutions to get rid of certain utf8 characters (like replacing "smart quotes" with plain quotes). So I tried:

#!/usr/bin/perl -w
use strict;
use utf8;

open (FILE, $ARGV[0]) or die "Can't read $ARGV[0].\n";

my $fileln = <FILE>;

$fileln =~ tr/\’/\'/;
[download]

This however does not make any changes. What am I doing wrong? If it matters, I am using perl 5.8.8.

Comment on "Asciitizing" utf8 Download Code

Replies are listed 'Best First'.
Re: "Asciitizing" utf8 by ikegami (Patriarch) on Mar 26, 2010 at 06:42 UTC
You're comparing the encoded form of the character with the character. Fix: `#!/usr/bin/perl -w use strict; use utf8; open(my $fh, '<:encoding(UTF-8)', $ARGV[0]) or die "Can't read $ARGV[0]: $!\n"; my $fileln = <$fh>; $fileln =~ tr/’/'/; ...[ do something with $fileln ]...` [download] In case you don't realize it, you didn't save your changes anywhere or do anything with them.	[reply] [d/l]
Re: "Asciitizing" utf8 by CountZero (Bishop) on Mar 26, 2010 at 07:03 UTC
Text::Unidecode might do (part of) what you want. CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James	[reply]
Re: "Asciitizing" utf8 by Anonymous Monk on Mar 26, 2010 at 07:08 UTC
`perl -Mopen=:std,:encoding(UTF-8) -pe " y/\x{2018}\x{2019}\x{201B}\x{ +201C}\x{201D}\x{201F}/\x27\x27\x27\x22\x22\x22/; " < input > o +utput` [download] or multiple files at once, creating a backup of reach `perl -i.orig -Mopen=:std,:encoding(UTF-8) -pe " y/\x{2018}\x{2019}\x +{201B}\x{201C}\x{201D}\x{201F}/\x27\x27\x27\x22\x22\x22/; " in +1 in2 in3 in4 in5` [download] On a side note, the above one-liner crashes with `-MO=Deparse` on win32	[reply] [d/l] [select]