in reply to Re^4: Decoding bad UTF-16
in thread Decoding bad UTF-16
The first program is buggy. You decode without ever encoding. You'd get a "wide character" warning for some inputs if you had warnings on.
The second program is also buggy, but for a different reason. You presume a line ends at byte 0x10, but that's not true.
And they output differently. The first outputs a mix of iso-latin-1 and UTF-8. The second program outputs UTF-16le or UTF-16be, probably the latter.
use strict; use warnings; use open ':std', ':locale'; my $file = shift; my $enc = "UTF-16"; open(my $fh, "<:encoding($enc)", $file) || die("Can't: $!"); while ( <$fh> ) { print; }
use strict; use warnings; use open ':std', ':locale'; my $file = shift; my $enc = "UTF-16"; my $file = do { open(my $fh, "<:raw", $file) || die("Can't: $!"); local $/; <$fh> }; my $str = decode('UTF-16', $_); print $str;
The output seems to have a space between every char
That's usually a sign of UTF-16le/UTF-16be/UCS-2le/UCS-2be being treated as ASCII or a derivative like iso-latin-1 or UTF-8.
|
|---|