in reply to Re: utf8::decode vs. Encode::decode with regard to the length function
in thread utf8::decode vs. Encode::decode with regard to the length function

Thanks for the response. Here's a more succinct example that demonstrates this issue in a different way.

#!/usr/bin/perl + use strict; use warnings; use Encode qw( ); my $orig = "\xE8\xAB\x86\x0A"; utf8::encode( my $enc_once = $orig ); utf8::decode( $enc_once ); print('length after first decode= ', length($enc_once), "\n"); utf8::decode( $enc_once ); print('length after second decode= ', length($enc_once), "\n"); # do it again but don't check the intermediate length utf8::encode( $enc_once = $orig ); utf8::decode( $enc_once ); utf8::decode( $enc_once ); print('length after second decode= ', length($enc_once), "\n");

Here's the output:

length after first decode= 4

length after second decode= 4

length after second decode= 2

Apparently checking the length before decoding again changes the result of calling length again, which I don't understand.

  • Comment on Re^2: utf8::decode vs. Encode::decode with regard to the length function
  • Download Code

Replies are listed 'Best First'.
Re^3: utf8::decode vs. Encode::decode with regard to the length function
by dave_the_m (Monsignor) on Dec 03, 2010 at 12:32 UTC
    That looks like a length-caching bug, and it's still present in blead. Can you perlbug this please?

    Dave.

Re^3: utf8::decode vs. Encode::decode with regard to the length function
by ikegami (Patriarch) on Dec 03, 2010 at 16:59 UTC

    Yeah, it's a length-caching bug

    As a test script:

    #!/usr/bin/perl use strict; use warnings; use Test::More tests => 8; { # Baseline. my $s = "\xE8\xAB\x86\x0A"; utf8::downgrade($s); is(length($s), 4); is($s, "\xE8\xAB\x86\x0A"); utf8::decode($s); is(length($s), 2); is($s, "\x{8AC6}\n"); } { # Check for length-caching bug. my $s = "\xE8\xAB\x86\x0A"; utf8::upgrade($s); is(length($s), 4); is($s, "\xE8\xAB\x86\x0A"); utf8::decode($s); is(length($s), 2); is($s, "\x{8AC6}\n"); } 1;
    1..8 ok 1 ok 2 ok 3 ok 4 ok 5 ok 6 not ok 7 # Failed test at a.pl line 15. # got: '4' # expected: '2' ok 8 # Looks like you failed 1 test of 8.