Re^2: Understanding pack and unpack changes for binary data between 5.8 and 5.10

It's a bit strange, but the internal representation of the string shouldn't* matter.

What I do find very strange is that it doesn't croak when passed non-bytes.

use strict;
use warnings;

use Data::Dumper qw( Dumper );

$Data::Dumper::Useqq  = 1;
$Data::Dumper::Terse  = 1;
$Data::Dumper::Indent = 0;

my $s = chr(0xC9);
utf8::downgrade($s);
print(Dumper(pack('V/a*', $s)), "\n");
utf8::upgrade($s);
print(Dumper(pack('V/a*', $s)), "\n");

print(Dumper(pack('V/a*', "\x{C9}\x{2660}")), "\n");
[download]

5.10.0:

"\1\0\0\0\311"             # Ok
"\1\0\0\0\x{c9}"           # Ok
"\2\0\0\0\x{c9}\x{2660}"   # Does this make sense???
[download]

On the other hand, 5.8.8 was very broken:

"\1\0\0\0\311"             # Ok
"\1\0\0\0\303"             # XXX
"\2\0\0\0\303\242"         # XXX
[download]

* — I realize it matters all to often, but that's getting fixed. In plfaces where it does matter, you can use utf8::upgrade and utf8::downgrade to control the internal format.

Comment on Re^2: Understanding pack and unpack changes for binary data between 5.8 and 5.10 Select or Download Code

Replies are listed 'Best First'.
Re^3: Understanding pack and unpack changes for binary data between 5.8 and 5.10 by squentin (Sexton) on Mar 12, 2009 at 15:54 UTC
The problem is that when I do a length on the return value. Of course I should have used "bytes", but as I said, the return value is a binary string, so returning a length in utf8 characters is strange. And what's great with this bug, is that you only see it when the original string has multi-bytes characters or when it is long enough. :) `use Encode qw/_utf8_on/; my $a="bj\xc3\xb6rk"; _utf8_on($a); my $binarystring=pack("V/a", $a); warn length $binarystring; warn bytes::length $binarystring; my $b="b"x1000; _utf8_on($b); my $binarystring2=pack("V/a", $b); warn length $binarystring2; warn bytes::length $binarystring2;` [download]	[reply] [d/l]
Re^4: Understanding pack and unpack changes for binary data between 5.8 and 5.10 by ikegami (Patriarch) on Mar 12, 2009 at 16:59 UTC
`$a` is 5 bytes long and `pack("v")` is 4 bytes long, so `$binarystring` should hold 9 bytes. `length($binarystring)` confirms the length, and `utf8::downgrade` would confirm that they are bytes. `$b` is 1000 bytes long and `pack("v")` is 4 bytes long, so `$binarystring2` should hold 1004 bytes. `length($binarystring2)` confirms the length, and `utf8::downgrade` would confirm that they are bytes. And what's great with this bug, is that you only see it when the original string has multi-bytes characters or when it is long enough. :) I don't see the problem. Are you expecting something other than 9 and 1004? Yes, the length of the internal representation is different (as reported by `bytes::length`), but why are you mucking with the internals? Speaking of mucking with internals, `utf8::decode` should normally be used instead of `_utf8_on`. so returning a length in utf8 characters is strange. It's a bit odd, but only because it's a bit inefficient.	[reply] [d/l] [select]
Re^5: Understanding pack and unpack changes for binary data between 5.8 and 5.10 by squentin (Sexton) on Mar 13, 2009 at 13:59 UTC
I needed the length of the string to write the string and its length in a binary file. I'm only using _utf8_on in this example, in the original code, the string already had its utf8 flag on (it was coming from gtk2 (which uses utf8 everywhere), so I was expecting it to be utf8-encoded. I understand that my code was ambiguous because it depends on the internal representation, I've written it a long time ago when I didn't have much experience in perl, and didn't really know how utf8 was handled. But I don't think using a string in pack should result in something that depends on the internal representation of the string : the internal representation should be internal :) Honestly, I don't like how utf8 is handled in perl, it tries to do everything automagically, but this makes things less clear.	[reply]
Re^6: Understanding pack and unpack changes for binary data between 5.8 and 5.10 by ikegami (Patriarch) on Mar 13, 2009 at 15:10 UTC
Re^6: Understanding pack and unpack changes for binary data between 5.8 and 5.10 by ikegami (Patriarch) on Mar 13, 2009 at 16:10 UTC
Re^7: Understanding pack and unpack changes for binary data between 5.8 and 5.10 by squentin (Sexton) on Mar 13, 2009 at 21:26 UTC
Some notes below your chosen depth have not been shown here