The Flag FB_QUIET seems to be the answer
providing a call back helps identifying the malformed bytes
Unfortunately, that doesn't always seem to be the case:
use warnings; use strict; use Encode qw/decode/; use Data::Dump; dd decode('UTF-16-BE', "\xD8\x3D\xDD\xFA", Encode::FB_QUIET|Encode::LE +AVE_SRC); dd decode('UTF-16-BE', "\x3D\xDD\xFA", Encode::FB_QUIET|Encode::LEAVE_ +SRC); dd decode('UTF-16-BE', "\x3D\xDD\xFA", sub{ sprintf "<U+%04X>", shift +}); dd decode('UTF-16-LE', "\x3D\xD8\xFA\xDD", Encode::FB_QUIET|Encode::LE +AVE_SRC); dd decode('UTF-16-LE', "\xD8\xFA\xDD", Encode::FB_QUIET|Encode::LEAVE_ +SRC); dd decode('UTF-16-LE', "\xD8\xFA\xDD", sub{ sprintf "<U+%04X>", shift +}); __END__ "\x{1F5FA}" "\x{3DDD}" "\x{3DDD}" "\x{1F5FA}" "\x{FAD8}" "\x{FAD8}"
It could be argued that this is a bug / oversight in Encode, and of course if we know we're reading UTF-16 we should always read an even number of bytes. But still, because of this behavior, and because I don't yet know if there are encodings where chopped up byte sequences might end up in valid characters, I'm doubtful that a generalized "read any file backwards with Encode" is reliable. Personally I'd just make a version for UTF-8 and UTF-16, and any others as needed, or other encodings can be converted to the supported ones.
In reply to Re^5: Processing an encoded file backwards
by haukex
in thread Processing an encoded file backwards
by LanX
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |