Re^2: Understanding pack and unpack changes for binary data between 5.8 and 5.10

...It didn't do that in perl 5.8

Another difference to be aware of is this:

my $s = "\x{1234}\x{5678}";   # string with utf8 flag on

print unpack("H*", $s), "\n";
[download]

With 5.8 this prints a hexdump of the internal (UTF-8) representation of the string — e.g. useful when debugging encoding issues

e188b4e599b8
[download]

while with 5.10, you'd get

3478
[download]

i.e. the low-byte values of the codepoints, with the high-byte part being truncated. With warnings enabled, you also get "Character in 'H' format wrapped in unpack at...".

With use bytes, or when explicitly turning off the utf8 flag (update: as shown below), you get the old behaviour. And specifically for debugging encoding issues, Devel::Peek is the recommended alternative since 5.10, because of this difference.

Comment on Re^2: Understanding pack and unpack changes for binary data between 5.8 and 5.10 Select or Download Code

Replies are listed 'Best First'.
Re^3: Understanding pack and unpack changes for binary data between 5.8 and 5.10 by ikegami (Patriarch) on Mar 11, 2009 at 15:48 UTC
with 5.10, you'd get [...] the low-byte values of the codepoints, with the high-byte part being truncated. With warnings enabled, you also get "Character in 'H' format wrapped in unpack at...". It's odd that it doesn't warn or croak with "Wide character in ...". If you want to dump the internal buffer, `use Encode qw( _utf8_off ); sub internal { _utf8_off( my $s = shift ); return $s; } my $s = "\x{1234}\x{5678}"; # string with utf8 flag on print unpack("H", internal($s)), "\n";` [download] Update*: Fixed error identified in reply.	[reply] [d/l]
Re^4: Understanding pack and unpack changes for binary data between 5.8 and 5.10 by almut (Canon) on Mar 11, 2009 at 16:23 UTC
`utf8::_utf8_off( my $s = shift );` I think you meant `Encode::_utf8_off(...)`.	[reply] [d/l] [select]
Re^3: Understanding pack and unpack changes for binary data between 5.8 and 5.10 by ikegami (Patriarch) on Mar 12, 2009 at 04:38 UTC
I don't see the problem. `use strict; use warnings; use Data::Dumper qw( Dumper ); $Data::Dumper::Useqq = 1; $Data::Dumper::Terse = 1; $Data::Dumper::Indent = 0; my $s = chr(0xC9); utf8::downgrade($s); print(Dumper(unpack('H', $s)), "\n"); utf8::upgrade($s); print(Dumper(unpack('H', $s)), "\n"); print(Dumper(unpack('H', "\x{C9}\x{2660}")), "\n");` [download] 5.10.0: `"c9" # Ok "c9" # Ok Character in 'H' format wrapped in unpack at 750077.pl line 16. "c960" # GIGO` [download] The internal* representation is and should be irrelevant. If you want to see the internal representation, it stands to reason that you should have to explicitely fetch it.	[reply] [d/l] [select]
Re^4: Understanding pack and unpack changes for binary data between 5.8 and 5.10 by almut (Canon) on Mar 12, 2009 at 09:24 UTC
I don't see the problem... I don't see a problem either. I just pointed out a difference, i.e. that something which people might have gotten used to, no longer behaves the way it did before...	[reply]