NICE example for ambiguity of UTF-16 if you don't get the start right!!!
For clarification: your point is that "\xFA" (first block) and "\xDD" (second block) should raise errors?
use warnings; use strict; use Encode qw/decode/; use Data::Dump qw/dd/; dd decode('UTF-16-BE', "\x3D\xDD\xFA", Encode::FB_CROAK|Encode::LEAVE_ +SRC ); dd decode('UTF-16-BE', "\xFA", Encode::FB_CROAK|Encode::LEAVE_ +SRC ); dd decode('UTF-16-LE', "\xD8\xFA\xDD", Encode::FB_CROAK|Encode::LEAVE_ +SRC ); dd decode('UTF-16-LE', "\xDD", Encode::FB_CROAK|Encode::LEAVE_ +SRC); __END__ "\x{3DDD}" "" "\x{FAD8}" ""
I agree, looks like a bug we should report.
A really strange one too ...
> Personally I'd just make a version for UTF-8 and UTF-16, and any others as needed, or other encodings can be converted to the supported ones.
I don't even know other wide encodings except unicode , so I prefer relying on Encode for utf8 and make sure utf16 are read modulo 4 bytes 2 bytes.
From the docs
As of version 2.12, "Encode" supports coderef values for "CHECK"; +see below. NOTE: Not all encodings support this feature. Some encodings ignor +e the *CHECK* argument. For example, Encode::Unicode ignores *CHECK* and + it always croaks on error.
... but it doesn't croak
Cheers Rolf
(addicted to the Perl Programming Language :)
Wikisyntax for the Monastery
FootballPerl is like chess, only without the dice
In reply to Re^6: Processing an encoded file backwards (updated)
by LanX
in thread Processing an encoded file backwards
by LanX
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |