codeacrobat has asked for the wisdom of the Perl Monks concerning the following question:

Hi fellow monks. I feel that my conversion utf8 to latin1 involves too much typing.
# utf8 -> latin1 encoding cat utf8.txt |perl -MEncode=from_to -pe 'from_to($_,"utf8","latin1");'
The reverse operation latin1 to utf8 is a quicky. It can be as short as:
echo Döner | perl -C6 -pe1
I believe that Perl v5.6 had a short way to convert the encodings.
tr/\0-\xFF//CU; # change Latin-1 to Unicode tr/\0-\x{FF}//UC; # change Unicode to Latin-1
Is there a short way in Perl > v5.6?

Replies are listed 'Best First'.
Re: quick translate utf8 to latin1 encoding
by almut (Canon) on Feb 15, 2007 at 20:59 UTC
    cat utf8.txt | perl -C1 -pe1 >latin1.txt

    At least, this works for me :)

    (All UTF-8 characters to convert must be representable in the Latin1 character set.)

      It works fine for characters in the native charset...

      # feeder.pl $s = join '', map chr, 0..255; print $s;
      # tester.pl $s = join '', map chr, 0..255; 1 while $read = read(STDIN, $in, 512, $ofs+=$read); print($in eq $s ? "pass\n" : "fail\n");
      $ perl feeder.pl | perl tester.pl pass $ perl feeder.pl | perl -C6 -pe1 | perl -C1 -pe1 | perl tester.pl pass

      ... But it doesn't work so well for characters not in the native charset.

      $ perl -C6 -e 'print "\x{2660}"' | perl -C1 -pe1 Wide character in print, <> line 1. [junk][junk][junk]

      Compare to

      $ perl -C6 -e 'print "\x{2660}"' | perl -C1 -MEncode -pe '$_=encode("i +so-latin-1",$_);' ?

      Up to the reader to choose if that's good enough.

Re: quick translate utf8 to latin1 encoding
by codeacrobat (Chaplain) on Feb 15, 2007 at 23:34 UTC
    from perldoc perlrun
    -C [number/list] The "-C" flag controls some Unicode of the Perl Unicode features +. ... I 1 STDIN is assumed to be in UTF-8
    I am missing the "can be used to translate from utf-8 into native latin-1 encoding". That is what most people look for. Or maybe I am just stupid. ;-)