in reply to Re^5: Processing an encoded file backwards
in thread Processing an encoded file backwards

Haukex++

NICE example for ambiguity of UTF-16 if you don't get the start right!!!

For clarification: your point is that "\xFA" (first block) and "\xDD" (second block) should raise errors?

use warnings; use strict; use Encode qw/decode/; use Data::Dump qw/dd/; dd decode('UTF-16-BE', "\x3D\xDD\xFA", Encode::FB_CROAK|Encode::LEAVE_ +SRC ); dd decode('UTF-16-BE', "\xFA", Encode::FB_CROAK|Encode::LEAVE_ +SRC ); dd decode('UTF-16-LE', "\xD8\xFA\xDD", Encode::FB_CROAK|Encode::LEAVE_ +SRC ); dd decode('UTF-16-LE', "\xDD", Encode::FB_CROAK|Encode::LEAVE_ +SRC); __END__ "\x{3DDD}" "" "\x{FAD8}" ""

I agree, looks like a bug we should report.

A really strange one too ...

> Personally I'd just make a version for UTF-8 and UTF-16, and any others as needed, or other encodings can be converted to the supported ones.

I don't even know other wide encodings except unicode , so I prefer relying on Encode for utf8 and make sure utf16 are read modulo 4 bytes 2 bytes.

update

From the docs

As of version 2.12, "Encode" supports coderef values for "CHECK"; +see below. NOTE: Not all encodings support this feature. Some encodings ignor +e the *CHECK* argument. For example, Encode::Unicode ignores *CHECK* and + it always croaks on error.

... but it doesn't croak

Cheers Rolf
(addicted to the Perl Programming Language :)
Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

Replies are listed 'Best First'.
Re^7: Processing an encoded file backwards (updated)
by choroba (Cardinal) on Jan 18, 2020 at 23:13 UTC
    What Perl version? I'm getting

    UTF-16BE:Partial character at /home/choroba/1.pl line 8.
    in 5.26.1.

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
      That's what I get in git-bash with 5.26.2

      probably a windows problem?

      $ perl use warnings; use strict; use Encode qw/decode/; use Data::Dumper qw/Dumper/; warn Dumper decode('UTF-16-BE', "\x3D\xDD\xFA", Encode::FB_CROAK|Encod +e::LEAVE_SRC ); warn Dumper decode('UTF-16-BE', "\xFA", Encode::FB_CROAK|Encod +e::LEAVE_SRC ); warn Dumper decode('UTF-16-LE', "\xD8\xFA\xDD", Encode::FB_CROAK|Encod +e::LEAVE_SRC ); warn Dumper decode('UTF-16-LE', "\xDD", Encode::FB_CROAK|Encod +e::LEAVE_SRC); __END__ $VAR1 = "\x{3ddd}"; $VAR1 = ''; $VAR1 = "\x{fad8}"; $VAR1 = ''; MINGW64 ~ $ perl -v This is perl 5, version 26, subversion 2 (v5.26.2) built for x86_64-ms +ys-thread-multi

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

      > What Perl version?

      'This is perl 5, version 24, subversion 1 (v5.24.1) built for MSWin32-x64-multi-thread'

      does it die?

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

        Yes, with exit code 2.

        This is perl 5, version 26, subversion 1 (v5.26.1) built for x86_64-linux-thread-multi

        map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]

        Just a note on this subthread: Encode can be upgraded separately from Perl.