I would have thought the following (quick hack) script would work:
use strict; use warnings; my ($inf,$outf)= @ARGV; $inf or die "Must have a file to process\n" ; $outf or $outf= $inf.".utf8"; open my $in, "<:encoding(utf16)", $inf or die "Can't open '$inf':$!"; open my $out, ">:utf8", $outf or die "Can't write '$outf':$!"; local $/; # slurp mode! print {$out} <$in> # text or die "Failed to convert file:$!"; close $in or die "Something weird happened closing '$inf': $!"; close $out or die "Failed to close '$outf', file is probably corrupted: $!";
Or even the more elegant one liner:
perl -pe "BEGIN {binmode STDIN, ':encoding(utf16)'; binmode STDOUT, ': +utf8'}"
But it doesnt work. If I use an input file with a few (three) Ĕ in it (0x0114), saved in utf-16 by Ultraedit on win2k I end up with a file with the octets FF FE 14 01 14 01 14 01 and after conversion the output file has the octets EF BB BF C2 BE 00 14 00 01 00 14 00 01 00 14 00 01, which is just wrong. Can anybody spot what the problem is or is Perls Utf-16 support borked?
Note that this was with Perl 5.8.6 from ActiveState.
Update: Turns out that this was all down to a display bug in Ultraedit. Thanks for the help, and sorry for wasting anybody's time.
In reply to Converting UTF-16 files to UTF-8 by demerphq
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |