Re^2: utf8::decode vs. Encode::decode with regard to the length function

Thanks for the response. Here's a more succinct example that demonstrates this issue in a different way.

#!/usr/bin/perl                                                       
+                                        
use strict;
use warnings;
use Encode qw( );

my $orig = "\xE8\xAB\x86\x0A";
utf8::encode( my $enc_once = $orig );

utf8::decode( $enc_once );
print('length after first decode= ', length($enc_once), "\n");
utf8::decode( $enc_once );
print('length after second decode= ', length($enc_once), "\n");

# do it again but don't check the intermediate length
utf8::encode( $enc_once = $orig );

utf8::decode( $enc_once );
utf8::decode( $enc_once );
print('length after second decode= ', length($enc_once), "\n");
[download]

Here's the output:

length after first decode= 4

length after second decode= 4

length after second decode= 2

Apparently checking the length before decoding again changes the result of calling length again, which I don't understand.

Comment on Re^2: utf8::decode vs. Encode::decode with regard to the length function Download Code

Replies are listed 'Best First'.
Re^3: utf8::decode vs. Encode::decode with regard to the length function by dave_the_m (Monsignor) on Dec 03, 2010 at 12:32 UTC
That looks like a length-caching bug, and it's still present in blead. Can you perlbug this please? Dave.	[reply]
Re^4: utf8::decode vs. Encode::decode with regard to the length function by ikegami (Patriarch) on Dec 03, 2010 at 18:46 UTC
RT#80190	[reply]
Re^3: utf8::decode vs. Encode::decode with regard to the length function by ikegami (Patriarch) on Dec 03, 2010 at 16:59 UTC
Yeah, it's a length-caching bug As a test script: `#!/usr/bin/perl use strict; use warnings; use Test::More tests => 8; { # Baseline. my $s = "\xE8\xAB\x86\x0A"; utf8::downgrade($s); is(length($s), 4); is($s, "\xE8\xAB\x86\x0A"); utf8::decode($s); is(length($s), 2); is($s, "\x{8AC6}\n"); } { # Check for length-caching bug. my $s = "\xE8\xAB\x86\x0A"; utf8::upgrade($s); is(length($s), 4); is($s, "\xE8\xAB\x86\x0A"); utf8::decode($s); is(length($s), 2); is($s, "\x{8AC6}\n"); } 1;` [download] `1..8 ok 1 ok 2 ok 3 ok 4 ok 5 ok 6 not ok 7 # Failed test at a.pl line 15. # got: '4' # expected: '2' ok 8 # Looks like you failed 1 test of 8.` [download]	[reply] [d/l] [select]