in reply to UTF-8 to Latin1 - unmatched characters?

Text::Unidecode, optionally using PerlIO::via::Unidecode should do the trick.

UPDATE: Yup, it does:

$ perl -MText::Unidecode -le 'print unidecode("\x{201c} \x{2013}")' " -

-sam

Replies are listed 'Best First'.
Re^2: UTF-8 to Latin1 - unmatched characters?
by uncommon13 (Novice) on Mar 27, 2008 at 15:08 UTC
    Thanks Sam. I used this, however, it also converts the other valid latin1 characters to ASCII.

    So, I found this which converts non-matched UTF-8 characters to something: http://linuxgazette.net/117/tag/4.html

    So basically, the code would be something like:

    # Converted UTF codes for non-matching ISO-8859-1 # Strip it down to basic ASCII %utf_entity = ( "\x{2019}", "'", "\x{201c}", '"', "\x{201d}", '"', "\x{2026}", "...", "\x{fffd}", "", ); s/(\X)/ exists $utf_entity{$1} ? $utf_entity{$1} : $1 /eg;
      I was going to recommend passing only characters that don't exist in iso-latin-1 to unidecode using a fallback handler to encode. It works, but I'm getting an error (Close with partial character.) when the file handle is closed, and I have no idea how to fix it.

      Here's the code anyway:

      use strict; use warnings; use PerlIO::encoding qw( ); use Text::Unidecode qw( unidecode ); use constant FB_UNIDECODE => sub { unidecode(chr($_[0])) }; my $file = '...'; local $PerlIO::encoding::fallback = FB_UNIDECODE; open(my $fh, '>:encoding(iso-8859-1)', $file) or die("Unable to create file \"$file\": $!\n"); print $fh "abc\x{201C}def\x{2013}ghi";
        Dear ikegami,

        This is exactly what I wanted :)

        It's absolutely brilliant to think about using the fallback handler.

        I don't get the error (Close with partial character.) which u mentioned though.

        Many thanks again :)