Re: utf8::decode vs. Encode::decode with regard to the length function

use strict;
use warnings;
use feature qw( say );

use Encode qw( );

my $orig = "\xE8\xAB\x86\x0A";

utf8::encode( my $enc_once = $orig );
utf8::encode( my $enc_twice = $enc_once );
say('length = ', length($orig));


{
   say("Using utf8::decode");
   utf8::decode( my $dec_once = $enc_twice );
   utf8::decode( my $dec_twice = $dec_once );
   say('length = ', length($dec_twice));
   say($orig eq $dec_twice ? 'ok' : 'not ok');
}

{
   say("Using Encode::decode");
   my $dec_once = Encode::decode('UTF-8', $enc_twice);
   my $dec_twice = Encode::decode('UTF-8', $dec_once);
   say('length = ', length($dec_twice));
   say($orig eq $dec_twice ? 'ok' : 'not ok');
}
[download]

length = 4
Using utf8::decode
length = 4
ok
Using Encode::decode
length = 4
ok
[download]

Works fine for me, both ways.

I'll take your word for it that you are experiencing a problem, but I'm not going to comb through the hundreds of lines you posted to find what it is. If this doesn't help, please post a minimal demonstration of the problem.

Comment on Re: utf8::decode vs. Encode::decode with regard to the length function Select or Download Code

Replies are listed 'Best First'.
Re^2: utf8::decode vs. Encode::decode with regard to the length function by Anonymous Monk on Dec 03, 2010 at 00:01 UTC
Thanks for the response. Here's a more succinct example that demonstrates this issue in a different way. #!/usr/bin/perl + use strict; use warnings; use Encode qw( ); my $orig = "\xE8\xAB\x86\x0A"; utf8::encode( my $enc_once = $orig ); utf8::decode( $enc_once ); print('length after first decode= ', length($enc_once), "\n"); utf8::decode( $enc_once ); print('length after second decode= ', length($enc_once), "\n"); # do it again but don't check the intermediate length utf8::encode( $enc_once = $orig ); utf8::decode( $enc_once ); utf8::decode( $enc_once ); print('length after second decode= ', length($enc_once), "\n"); [download] Here's the output: length after first decode= 4 length after second decode= 4 length after second decode= 2 Apparently checking the length before decoding again changes the result of calling length again, which I don't understand.	[reply] [d/l]
Re^3: utf8::decode vs. Encode::decode with regard to the length function by dave_the_m (Monsignor) on Dec 03, 2010 at 12:32 UTC
That looks like a length-caching bug, and it's still present in blead. Can you perlbug this please? Dave.	[reply]
Re^4: utf8::decode vs. Encode::decode with regard to the length function by ikegami (Patriarch) on Dec 03, 2010 at 18:46 UTC
RT#80190	[reply]
Re^3: utf8::decode vs. Encode::decode with regard to the length function by ikegami (Patriarch) on Dec 03, 2010 at 16:59 UTC
Yeah, it's a length-caching bug As a test script: `#!/usr/bin/perl use strict; use warnings; use Test::More tests => 8; { # Baseline. my $s = "\xE8\xAB\x86\x0A"; utf8::downgrade($s); is(length($s), 4); is($s, "\xE8\xAB\x86\x0A"); utf8::decode($s); is(length($s), 2); is($s, "\x{8AC6}\n"); } { # Check for length-caching bug. my $s = "\xE8\xAB\x86\x0A"; utf8::upgrade($s); is(length($s), 4); is($s, "\xE8\xAB\x86\x0A"); utf8::decode($s); is(length($s), 2); is($s, "\x{8AC6}\n"); } 1;` [download] `1..8 ok 1 ok 2 ok 3 ok 4 ok 5 ok 6 not ok 7 # Failed test at a.pl line 15. # got: '4' # expected: '2' ok 8 # Looks like you failed 1 test of 8.` [download]	[reply] [d/l] [select]