The first program is buggy. You decode without ever encoding. You'd get a "wide character" warning for some inputs if you had warnings on.
The second program is also buggy, but for a different reason. You presume a line ends at byte 0x10, but that's not true.
And they output differently. The first outputs a mix of iso-latin-1 and UTF-8. The second program outputs UTF-16le or UTF-16be, probably the latter.
use strict; use warnings; use open ':std', ':locale'; my $file = shift; my $enc = "UTF-16"; open(my $fh, "<:encoding($enc)", $file) || die("Can't: $!"); while ( <$fh> ) { print; }
use strict; use warnings; use open ':std', ':locale'; my $file = shift; my $enc = "UTF-16"; my $file = do { open(my $fh, "<:raw", $file) || die("Can't: $!"); local $/; <$fh> }; my $str = decode('UTF-16', $_); print $str;
The output seems to have a space between every char
That's usually a sign of UTF-16le/UTF-16be/UCS-2le/UCS-2be being treated as ASCII or a derivative like iso-latin-1 or UTF-8.
In reply to Re^5: Decoding bad UTF-16
by ikegami
in thread Decoding bad UTF-16
by gregality
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |