chuckd has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to figure our the best way to convert a UTF16 text file to a UTF8 text file. Currently my group checks for two funny characters within the text, if found we call convert_to_UTF8(\text);
I think this call is from the Unicode::Map module.
Is there a better way to do this?
I read through the Open call documention on Perl.org and it says that you can open a file like this: open(FH, "<:encoding(UTF-8)", "file")
does this mean I can use it with a UTF16 file and have it convert all UTF16 characters to UTF8??? What is the best way to convert???

Replies are listed 'Best First'.
Re: best way to convert to UTF8 from UTF16
by ikegami (Patriarch) on Oct 24, 2008 at 01:46 UTC
    #!/usr/bin/perl # Usage: # utf16to8.pl infile > outfile use strict; use warnings; binmode(STDOUT, ':raw:encoding(UTF-8)'); for my $qfn (@ARGV) { # Assumes the presence of a BOM. open(my $fh, "<:raw:encoding(UTF-16)", $qfn) or die("Can't open \"$qfn\": $!\n"); print while <$fh>; }

    :raw is needed to disable the crlf layer if present. It would corrupt the data on the UTF-16 side, and the UTF-8 sides needs it to mirror the UTF-16 side.

Re: best way to convert to UTF8 from UTF16
by trwww (Priest) on Oct 24, 2008 at 03:40 UTC
    $ iconv -f UTF16 -t UTF8 file.UTF16.txt > file.UTF8.txt

    If you need to do it programmatically, use Text::Iconv

Re: best way to convert to UTF8 from UTF16
by moritz (Cardinal) on Oct 24, 2008 at 06:42 UTC
    There's more than one way to do it:
    use Encode qw(from_to); ... from_to($string, 'UTF-16le', 'UTF-8');
      Hi,
      I tried this: from_to($text, "iso-8851-1", "utf8");
      but nothing happened. It looks the same. Am I doing something wrong?
        Why did you change the provided code?