quick translate utf8 to latin1 encoding

codeacrobat has asked for the wisdom of the Perl Monks concerning the following question:

Hi fellow monks. I feel that my conversion utf8 to latin1 involves too much typing.

# utf8 -> latin1 encoding
cat utf8.txt |perl -MEncode=from_to -pe 'from_to($_,"utf8","latin1");'
[download]

The reverse operation latin1 to utf8 is a quicky. It can be as short as:

echo Döner | perl -C6 -pe1

I believe that Perl v5.6 had a short way to convert the encodings.

    tr/\0-\xFF//CU;        # change Latin-1 to Unicode
    tr/\0-\x{FF}//UC;        # change Unicode to Latin-1
[download]

Is there a short way in Perl > v5.6?

Comment on quick translate utf8 to latin1 encoding Select or Download Code

Replies are listed 'Best First'.
Re: quick translate utf8 to latin1 encoding by almut (Canon) on Feb 15, 2007 at 20:59 UTC
`cat utf8.txt \| perl -C1 -pe1 >latin1.txt` [download] At least, this works for me :) (All UTF-8 characters to convert must be representable in the Latin1 character set.)	[reply] [d/l]
Re^2: quick translate utf8 to latin1 encoding by ikegami (Patriarch) on Feb 16, 2007 at 01:43 UTC
It works fine for characters in the native charset... `# feeder.pl $s = join '', map chr, 0..255; print $s;` [download] `# tester.pl $s = join '', map chr, 0..255; 1 while $read = read(STDIN, $in, 512, $ofs+=$read); print($in eq $s ? "pass\n" : "fail\n");` [download] `$ perl feeder.pl \| perl tester.pl pass $ perl feeder.pl \| perl -C6 -pe1 \| perl -C1 -pe1 \| perl tester.pl pass` [download] ... But it doesn't work so well for characters not in the native charset. `$ perl -C6 -e 'print "\x{2660}"' \| perl -C1 -pe1 Wide character in print, <> line 1. [junk][junk][junk]` [download] Compare to `$ perl -C6 -e 'print "\x{2660}"' \| perl -C1 -MEncode -pe '$_=encode("i +so-latin-1",$_);' ?` [download] Up to the reader to choose if that's good enough.	[reply] [d/l] [select]
Re: quick translate utf8 to latin1 encoding by codeacrobat (Chaplain) on Feb 15, 2007 at 23:34 UTC
from perldoc perlrun `-C [number/list] The "-C" flag controls some Unicode of the Perl Unicode features +. ... I 1 STDIN is assumed to be in UTF-8` [download] I am missing the "can be used to translate from utf-8 into native latin-1 encoding". That is what most people look for. Or maybe I am just stupid. ;-)	[reply] [d/l]
Re^2: quick translate utf8 to latin1 encoding by Joost (Canon) on Feb 15, 2007 at 23:46 UTC
AFAIK without any additional parameters the output depends on your locale settings. Which probably (for LANG=C) default to latin-1. I might be wrong though. "What should it profit a man, if he should win a flame war, yet lose his cool?"	[reply]