Better unpack solution available

Dirk80 has asked for the wisdom of the Perl Monks concerning the following question:

Hello,

I've written a little script and it is working. My input is a hexstream. It is constructed as follows:

once a null terminated string, i.e. Z*
many repetitions of: id, offset, size, path, crc32 (n n N A<offset-4> N) x ?

The problem is that there is a string with variable length (path). The length of the path is the value of the offset-4.

The following script is working.

#!/usr/bin/perl

use strict;
use warnings;

my $mem = "";
my ($section, $load_id, $offset_to_next_load, $load_size, $load_path, 
+$crc32_load_path);

my $hex_data = "54424c004100001300000BAD2F62696E2F7465737432302E64666C
+376F6B0F42000013000000042F62696E2F7465737430312E64666C376D6C0F4300001
+3000000042F62696E2F7465737430322E64666C376D6D0F44000013000000042F6269
+6E2F7465737430332E64666C376D6E0F45000013000000042F62696E2F74657374303
+42E64666C376D6F0F46000013000000042F62696E2F7465737430352E64666C376D70
+0F47000013000000042F62696E2F7465737430362E64666C376D710F4800001300000
+0042F62696E2F7465737430372E64666C376D720F49000013000000042F62696E2F74
+65737430382E64666C376D730F4A000013000000042F62696E2F7465737430392E646
+66C376D740F4B000013000000042F62696E2F7465737431302E64666C376E6B0F4C00
+0013000000042F62696E2F7465737431312E64666C376E6C0F4D000013000000042F6
+2696E2F7465737431322E64666C376E6D0F4E000013000000042F62696E2F74657374
+31332E64666C376E6E0F4F000013000000042F62696E2F7465737431342E64666C376
+E6F0F50000013000000042F62696E2F7465737431352E64666C376E700F5100001300
+0000042F62696E2F7465737431362E64666C376E710F52000013000000042F62696E2
+F7465737431372E64666C376E720F53000013000000042F62696E2F7465737431382E
+64666C376E730F54000013000000042F62696E2F7465737431392E64666C376E740F5
+5000011000000042F6574632F74657374322E73683B0C08495600000D000000082F47
+7053772E63686B1175D3BB";

# put hex data into memory
$mem = pack('H*', $hex_data );

# unpack null terminated section name from memory
$section = unpack('Z*', $mem);
$mem = substr($mem, length($section) + 1 );

# print section name
print $section . ":\n";
for (1 .. length($section) + 1){print "-"}
print "\n\n";

# unpack and print load id, offset, size, path (length of path is "off
+set - 4") and crc32 of each load entry
while (length($mem) > 0)
{
    ($load_id, $offset_to_next_load, $load_size) = unpack('n n N', $me
+m);
    $mem = substr($mem, 8);

    print "Load ID             : " . $load_id . "\n";
    print "Offset to next load : " . $offset_to_next_load . "\n";
    print "Load Size           : " . $load_size . "\n";

    ($load_path, $crc32_load_path) = unpack('A' . ($offset_to_next_loa
+d-4) . 'N', $mem);
    $mem = substr($mem, $offset_to_next_load);

    print "Load Path           : " . $load_path . "\n";
    print "CRC32 of Load Path  : " . $crc32_load_path . "\n";

    print "\n\n";
}
[download]

It would be very interesting for me if it is possible to read the hex data with one elegant unpack command. I tried it but because of the variable length of the path I did not get a solution. And of course due to the repetitions the following variables would have to be arrays instead of scalars ($load_id, $offset_to_next_load, $load_size). But I think that I could handle this with a map command within the unpack command. My main problem is the variable length of the path.

Thank you for your help.

Dirk

Comment on Better unpack solution available Download Code

Replies are listed 'Best First'.
Re: Better unpack solution available by ikegami (Patriarch) on Apr 09, 2010 at 15:46 UTC
It's not really your unpacks that are messy. The comments that parrot the obvious. Concatenation is used when interpolation or `printf` would be clearer. Some var names are needlessly long. ("`load`" doesn't add value.) Each record's size and crc are printed, but that's not data. But your unpacks could be improved by treating the records as follows the following format: `header body footer ------------ --------- ------ id body_size size path crc32` [download] How I'd write it: #!/usr/bin/perl use strict; use warnings; my $mem = pack('H', '54424c004100001300000BAD2F62696E2F7465737432302E +64666C376F6B0F42000013000000042F62696E2F7465737430312E64666C376D6C0F4 +3000013000000042F62696E2F7465737430322E64666C376D6D0F4400001300000004 +2F62696E2F7465737430332E64666C376D6E0F45000013000000042F62696E2F74657 +37430342E64666C376D6F0F46000013000000042F62696E2F7465737430352E64666C +376D700F47000013000000042F62696E2F7465737430362E64666C376D710F4800001 +3000000042F62696E2F7465737430372E64666C376D720F49000013000000042F6269 +6E2F7465737430382E64666C376D730F4A000013000000042F62696E2F74657374303 +92E64666C376D740F4B000013000000042F62696E2F7465737431302E64666C376E6B +0F4C000013000000042F62696E2F7465737431312E64666C376E6C0F4D00001300000 +0042F62696E2F7465737431322E64666C376E6D0F4E000013000000042F62696E2F74 +65737431332E64666C376E6E0F4F000013000000042F62696E2F7465737431342E646 +66C376E6F0F50000013000000042F62696E2F7465737431352E64666C376E700F5100 +0013000000042F62696E2F7465737431362E64666C376E710F52000013000000042F6 +2696E2F7465737431372E64666C376E720F53000013000000042F62696E2F74657374 +31382E64666C376E730F54000013000000042F62696E2F7465737431392E64666C376 +E740F55000011000000042F6574632F74657374322E73683B0C08495600000D000000 +082F477053772E63686B1175D3BB'); (my $section, $mem) = unpack('Z a', $mem); print("$section\n"); print( ( "-" x length($section) ), "\n"); while (length($mem)) { (my $id, my $body, my $crc, $mem) = unpack('n n/a N a', $mem); my ($size, $path) = unpack('N a', $body); # Check CRC here. print("Load ID : $id\n"); print("Load Size : $size\n"); print("Load Path : $path\n"); print("\n"); } [download] `TBL --- Load ID : 16640 Load Size : 2989 Load Path : /bin/test20.dfl Load ID : 16896 Load Size : 4 Load Path : /bin/test01.dfl Load ID : 17152 Load Size : 4 Load Path : /bin/test02.dfl ...` [download] Update*: Added enumeration at the top.	[reply] [d/l] [select]
Re^2: Better unpack solution available by Dirk80 (Pilgrim) on Apr 09, 2010 at 21:25 UTC
Thank you very much for your great answer. I think it is a good way of learning to try it first time by myself and then looking at a better solution. Very good idea from you to use 'a*' to read the rest of the memory and shorten it by this way instead of using substr. Also interesting that you were just interpreting the offset field as the length of a body. So the length is directly in front of the body. And another small thing I already knew but did not use. The 'x' operator to underline the string. So I learnt a lot from your post. Thank you.	[reply]
Re^3: Better unpack solution available by Dirk80 (Pilgrim) on Apr 09, 2010 at 22:31 UTC
Hello, It's me again. Now I wrote a new version of the script. The first unpack is getting the length of the path. The seoncd unpack is reading one repetition of data in one swoop. I like this solution because it matches the original record the most. Although I have in mind that I can use the length/string technique in the post of ikegami if the length is directly before the string. #!/usr/bin/perl use strict; use warnings; my $mem = pack('H', '54424c004100001300000BAD2F62696E2F7465737432302E +64666C376F6B0F42000013000000042F62696E2F7465737430312E64666C376D6C0F4 +3000013000000042F62696E2F7465737430322E64666C376D6D0F4400001300000004 +2F62696E2F7465737430332E64666C376D6E0F45000013000000042F62696E2F74657 +37430342E64666C376D6F0F46000013000000042F62696E2F7465737430352E64666C +376D700F47000013000000042F62696E2F7465737430362E64666C376D710F4800001 +3000000042F62696E2F7465737430372E64666C376D720F49000013000000042F6269 +6E2F7465737430382E64666C376D730F4A000013000000042F62696E2F74657374303 +92E64666C376D740F4B000013000000042F62696E2F7465737431302E64666C376E6B +0F4C000013000000042F62696E2F7465737431312E64666C376E6C0F4D00001300000 +0042F62696E2F7465737431322E64666C376E6D0F4E000013000000042F62696E2F74 +65737431332E64666C376E6E0F4F000013000000042F62696E2F7465737431342E646 +66C376E6F0F50000013000000042F62696E2F7465737431352E64666C376E700F5100 +0013000000042F62696E2F7465737431362E64666C376E710F52000013000000042F6 +2696E2F7465737431372E64666C376E720F53000013000000042F62696E2F74657374 +31382E64666C376E730F54000013000000042F62696E2F7465737431392E64666C376 +E740F55000011000000042F6574632F74657374322E73683B0C08495600000D000000 +082F477053772E63686B1175D3BB'); # NOTE: # Section name is a null terminated string. # Z is setting the memory pointer after the # terminating null byte, but in the corresponding # variable the string is stored without the null (my $section, $mem) = unpack('Z* a', $mem); print("$section\n"); print("-" x length($section), "\n\n"); # unpack and print # load id, offset, size, # path (length of path is "offset - 4") # and crc32 of each load entry while (length($mem)) { # NOTE: # The path is a variable string. And the length # of the string is not directly before the string. # If the length of the string would be directly # before the string then I could use the # length/string technique. But in this case I have # to use two unpacks. The first is to get the length # of the string (offset_to_next_load - 4). The second # unpack then has all information to read one # repetition of data in one swoop. # get length of path my $length_of_load_path = unpack('x2 n', $mem) - 4; # get data (my $load_id, my $offset_to_next_load, my $load_size, my $load_path, my $crc32_load_path, $mem) = unpack("n n N A$length_of_load_path N a", $mem); print("Load ID : $load_id\n"); print("Offset to next load : $offset_to_next_load\n"); print("Load Size : $load_size\n"); print("Load Path : $load_path\n"); print("CRC32 of Load Path : $crc32_load_path\n\n"); } [download]	[reply] [d/l]