Length of unpacked string for "hex" data

SirBones has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks. I'm writing a data file dumper. Gee, that's never been done in Perl before, has it? :-) I've looked around a bit for a simple answer to this, but I admit to being pretty new to pack/unpack intricacies; so I humbly beg your pardon if I'm missing something obvious.

The files in question are constructed (besides some fixed length header records) with repeating fields consisting of a 2-byte ASCII keyword, a 1-byte length (byte count), and a data field (size specified in length field.) The data field can contain either ASCII or binary data, depending on the keyword. For example:

RT | 0x07 | testing
CD | 0x08 | 0x01020304FFDDEC19
[download]

Perfect for "unpack", correct? My only hangup is trying to generalize the loop that spits out the keywords and data. The data needs to be displayed as either hex or ASCII, depending on the keyword. So I have a hash which matches either an "A" or an "H" with each keyword. I then use the contents of the hash as the type specifier in the unpack template:

C/$kwfmt{$kw}
[download]

When the data is ASCII, this works fine. But when it's hex, I only get half of my string displayed because the "H" format uses the number of nibbles as its length, where my length specification is (of course) in bytes. Of course I can (and currently do) handle this in a cludgy way by checking the format during the loop and then using a different template based on "A" or "H"; but it's ugly.

I tried throwing in a repeat value after the C/$kwfmt{$kw} but of course then Perl says I can't use a count with the "/" specifier. I've also tried various means of doubling the "C" field within the template when the data is hex but that hasn't worked out either.

Here's a small demo to exemplify my dilemma. It prints the ASCII string correctly, but just half of the hex string:

#!/usr/bin/perl -w

use strict;
# Some example "keywords" and how they should be displayed
my %kwfmt = (
  "RT" => "A",
  "PN" => "A",
  "SN" => "A",
  "AB" => "H",
  "CD" => "H",
  "B1" => "H",
);

# The kind of thing I will see in my file
my @record;
$record[0] = pack ("A2CA7", "RT", 7, "testing");
$record[1] = pack ("A2CH16", "CD", 8, "01020304FFDDEC19");

# Prints ASCII fields fine, truncates hex
for (my $i=0; $i<2; $i++) {
  my ($kw) = unpack("A2", $record[$i]);
  my ($rdata) = unpack("x2C/$kwfmt{$kw}", $record[$i]);
  print "$kw $rdata\n";
}
[download]

The output of which is:

RT testing
CD 01020304
[download]

As a side note, and probably displaying my ignorance, I wonder why the "H" specifier deals with nibbles as its basic unit and not bytes. Dealing with "hex" data by the byte would seem to be the far more common operation.

Thanks (as usual) so much.

Ken

"This bounty hunter is my kind of scum: Fearless and inventive." --J.T. Hutt

Comment on Length of unpacked string for "hex" data Select or Download Code

Replies are listed 'Best First'.
Re: Length of unpacked string for "hex" data by ikegami (Patriarch) on Apr 24, 2006 at 20:07 UTC
Extract the field, then convert to hex if needed: `my %hex = map { $_ => 1 } qw( AB CD B1 ); my @records = ( pack("A2Ca7", "RT", 7, "testing"), pack("A2CH16", "CD", 8, "01020304FFDDEC19"), ); foreach my $record (@records) { my ($kw, $rdata) = unpack("A2C/a", $record); $rdata = uc(unpack('H', $rdata)) if $hex{$kw}; print "$kw $rdata\n"; }` [download] or use a dispatch table to avoid the `if`: `sub format_as_text { return $_[0]; } sub format_as_hex { return uc(unpack('H', $_[0])); } my %kwfmt = ( RT => \&format_as_text, PN => \&format_as_text, SN => \&format_as_text, AB => \&format_as_hex, CD => \&format_as_hex, B1 => \&format_as_hex, ); my @records = ( pack("A2Ca7", "RT", 7, "testing"), pack("A2CH16", "CD", 8, "01020304FFDDEC19"), ); foreach (@records) { my ($kw, $rdata) = unpack("A2C/a", $_); $rdata = $kwfmt{$kw}->($rdata); print "$kw $rdata\n"; }` [download] By the way, there's no reason to hardcode the number of elements in the array. Instead of `for (my $i=0; $i<2; $i++)` you should use `for (my $i=0; $i<@records; $i++)` Or better yet, use `for my $i (0..$#records)` since it's easier to read and just as efficient. I used `foreach my $record (@records)` since it's even simpler and we didn't care about the record index.	[reply] [d/l] [select]
Re^2: Length of unpacked string for "hex" data by SirBones (Friar) on Apr 24, 2006 at 22:52 UTC
Thanks, very cool. And I like the dispatch table trick as well; seems I have another app where that will be useful. For the present case I'll stick with your "if"; it's less intrusive than mine. I noticed you used a lower-case "a" in the template: `"A2C/a"` [download] Meaning to parse on a null-padded string rather than a space-padded one. Is that important here? Cheers, -Ken "This bounty hunter is my kind of scum: Fearless and inventive." --J.T. Hutt	[reply] [d/l]
Re^3: Length of unpacked string for "hex" data by ikegami (Patriarch) on Apr 24, 2006 at 23:01 UTC
'A' will remove trailing NULs and whitespace. 'a' will not. In other words, `($str) = unpack('c/A', $data);` is equivalent to `($str) = unpack('c/a', $data);` `$str =~ s/[\0\s]+$//;`	[reply] [d/l] [select]
Re: Length of unpacked string for "hex" data by wedgef5 (Scribe) on Apr 24, 2006 at 18:54 UTC
I'm far from an expert on pack/unpack myself. In fact, I saw your post as a chance to learn a little myself! I think perhaps you should make the 'C' an 'I' in your templates, and then specify the hex data as 16 bits in length. The following worked for me...at least I got all 8 bytes of the hex back. `$record[0] = pack ("A2IA7", "RT", 7, "testing"); $record[1] = pack ("A2IH16", "CD", 16, "01020304FFDDEC19"); # Prints ASCII fields fine, truncates hex for (my $i=0; $i<2; $i++) { my ($kw) = unpack("A2", $record[$i]); my ($rdata) = unpack("x2I/$kwfmt{$kw}", $record[$i]); print "$kw $rdata\n"; }` [download]	[reply] [d/l]
Re^2: Length of unpacked string for "hex" data by SirBones (Friar) on Apr 24, 2006 at 19:54 UTC
Thanks for the suggestion. Of course I've managed to present a bad example. Your suggestion of "I" in the template only works if the hex string is 8 bytes or shorter (from what I can see.) In reality my hex strings may be significantly longer. Ken "This bounty hunter is my kind of scum: Fearless and inventive." --J.T. Hutt	[reply]
Re: Length of unpacked string for "hex" data by swampyankee (Parson) on Apr 24, 2006 at 21:35 UTC
As a side note, and probably displaying my ignorance, I wonder why the "H" specifier deals with nibbles as its basic unit and not bytes. Dealing with "hex" data by the byte would seem to be the far more common operation. I would suspect dealing with nybbles, vice bytes, makes it easier when one has to deal with BCD emc "Being forced to write comments actually improves code, because it is easier to fix a crock than to explain it. " —G. Steele	[reply]
Re: Length of unpacked string for "hex" data by ikegami (Patriarch) on Apr 24, 2006 at 21:59 UTC
As a side note, and probably displaying my ignorance, I wonder why the "H" specifier deals with nibbles as its basic unit and not bytes. Dealing with "hex" data by the byte would seem to be the far more common operation. The repeat count pertains to the number of inputs for `pack` and to the number of outputs for `unpack`. This is consistent across (almost?) all formats. For `pack 'H'`, the repeat count translates to nibble count because each character passed to `pack 'H'` contains one nibble of information. Similarly, For `unpack 'H'`, the repeat count translates to nibble count because each character returned by `unpack 'H'` contains one nibble of information.	[reply] [d/l] [select]