Dirk80 has asked for the wisdom of the Perl Monks concerning the following question:

Hello,

I've written a little script and it is working. My input is a hexstream. It is constructed as follows:

The problem is that there is a string with variable length (path). The length of the path is the value of the offset-4.

The following script is working.

#!/usr/bin/perl use strict; use warnings; my $mem = ""; my ($section, $load_id, $offset_to_next_load, $load_size, $load_path, +$crc32_load_path); my $hex_data = "54424c004100001300000BAD2F62696E2F7465737432302E64666C +376F6B0F42000013000000042F62696E2F7465737430312E64666C376D6C0F4300001 +3000000042F62696E2F7465737430322E64666C376D6D0F44000013000000042F6269 +6E2F7465737430332E64666C376D6E0F45000013000000042F62696E2F74657374303 +42E64666C376D6F0F46000013000000042F62696E2F7465737430352E64666C376D70 +0F47000013000000042F62696E2F7465737430362E64666C376D710F4800001300000 +0042F62696E2F7465737430372E64666C376D720F49000013000000042F62696E2F74 +65737430382E64666C376D730F4A000013000000042F62696E2F7465737430392E646 +66C376D740F4B000013000000042F62696E2F7465737431302E64666C376E6B0F4C00 +0013000000042F62696E2F7465737431312E64666C376E6C0F4D000013000000042F6 +2696E2F7465737431322E64666C376E6D0F4E000013000000042F62696E2F74657374 +31332E64666C376E6E0F4F000013000000042F62696E2F7465737431342E64666C376 +E6F0F50000013000000042F62696E2F7465737431352E64666C376E700F5100001300 +0000042F62696E2F7465737431362E64666C376E710F52000013000000042F62696E2 +F7465737431372E64666C376E720F53000013000000042F62696E2F7465737431382E +64666C376E730F54000013000000042F62696E2F7465737431392E64666C376E740F5 +5000011000000042F6574632F74657374322E73683B0C08495600000D000000082F47 +7053772E63686B1175D3BB"; # put hex data into memory $mem = pack('H*', $hex_data ); # unpack null terminated section name from memory $section = unpack('Z*', $mem); $mem = substr($mem, length($section) + 1 ); # print section name print $section . ":\n"; for (1 .. length($section) + 1){print "-"} print "\n\n"; # unpack and print load id, offset, size, path (length of path is "off +set - 4") and crc32 of each load entry while (length($mem) > 0) { ($load_id, $offset_to_next_load, $load_size) = unpack('n n N', $me +m); $mem = substr($mem, 8); print "Load ID : " . $load_id . "\n"; print "Offset to next load : " . $offset_to_next_load . "\n"; print "Load Size : " . $load_size . "\n"; ($load_path, $crc32_load_path) = unpack('A' . ($offset_to_next_loa +d-4) . 'N', $mem); $mem = substr($mem, $offset_to_next_load); print "Load Path : " . $load_path . "\n"; print "CRC32 of Load Path : " . $crc32_load_path . "\n"; print "\n\n"; }

It would be very interesting for me if it is possible to read the hex data with one elegant unpack command. I tried it but because of the variable length of the path I did not get a solution. And of course due to the repetitions the following variables would have to be arrays instead of scalars ($load_id, $offset_to_next_load, $load_size). But I think that I could handle this with a map command within the unpack command. My main problem is the variable length of the path.

Thank you for your help.

Dirk

Replies are listed 'Best First'.
Re: Better unpack solution available
by ikegami (Patriarch) on Apr 09, 2010 at 15:46 UTC

    It's not really your unpacks that are messy.

    • The comments that parrot the obvious.
    • Concatenation is used when interpolation or printf would be clearer.
    • Some var names are needlessly long. ("load" doesn't add value.)
    • Each record's size and crc are printed, but that's not data.

    But your unpacks could be improved by treating the records as follows the following format:

    header body footer ------------ --------- ------ id body_size size path crc32

    How I'd write it:

    #!/usr/bin/perl use strict; use warnings; my $mem = pack('H*', '54424c004100001300000BAD2F62696E2F7465737432302E +64666C376F6B0F42000013000000042F62696E2F7465737430312E64666C376D6C0F4 +3000013000000042F62696E2F7465737430322E64666C376D6D0F4400001300000004 +2F62696E2F7465737430332E64666C376D6E0F45000013000000042F62696E2F74657 +37430342E64666C376D6F0F46000013000000042F62696E2F7465737430352E64666C +376D700F47000013000000042F62696E2F7465737430362E64666C376D710F4800001 +3000000042F62696E2F7465737430372E64666C376D720F49000013000000042F6269 +6E2F7465737430382E64666C376D730F4A000013000000042F62696E2F74657374303 +92E64666C376D740F4B000013000000042F62696E2F7465737431302E64666C376E6B +0F4C000013000000042F62696E2F7465737431312E64666C376E6C0F4D00001300000 +0042F62696E2F7465737431322E64666C376E6D0F4E000013000000042F62696E2F74 +65737431332E64666C376E6E0F4F000013000000042F62696E2F7465737431342E646 +66C376E6F0F50000013000000042F62696E2F7465737431352E64666C376E700F5100 +0013000000042F62696E2F7465737431362E64666C376E710F52000013000000042F6 +2696E2F7465737431372E64666C376E720F53000013000000042F62696E2F74657374 +31382E64666C376E730F54000013000000042F62696E2F7465737431392E64666C376 +E740F55000011000000042F6574632F74657374322E73683B0C08495600000D000000 +082F477053772E63686B1175D3BB'); (my $section, $mem) = unpack('Z* a*', $mem); print("$section\n"); print( ( "-" x length($section) ), "\n"); while (length($mem)) { (my $id, my $body, my $crc, $mem) = unpack('n n/a N a*', $mem); my ($size, $path) = unpack('N a*', $body); # Check CRC here. print("Load ID : $id\n"); print("Load Size : $size\n"); print("Load Path : $path\n"); print("\n"); }
    TBL --- Load ID : 16640 Load Size : 2989 Load Path : /bin/test20.dfl Load ID : 16896 Load Size : 4 Load Path : /bin/test01.dfl Load ID : 17152 Load Size : 4 Load Path : /bin/test02.dfl ...

    Update: Added enumeration at the top.

      Thank you very much for your great answer. I think it is a good way of learning to try it first time by myself and then looking at a better solution.

      Very good idea from you to use 'a*' to read the rest of the memory and shorten it by this way instead of using substr. Also interesting that you were just interpreting the offset field as the length of a body. So the length is directly in front of the body. And another small thing I already knew but did not use. The 'x' operator to underline the string.

      So I learnt a lot from your post. Thank you.

        Hello,

        It's me again. Now I wrote a new version of the script. The first unpack is getting the length of the path. The seoncd unpack is reading one repetition of data in one swoop. I like this solution because it matches the original record the most. Although I have in mind that I can use the length/string technique in the post of ikegami if the length is directly before the string.

        #!/usr/bin/perl use strict; use warnings; my $mem = pack('H*', '54424c004100001300000BAD2F62696E2F7465737432302E +64666C376F6B0F42000013000000042F62696E2F7465737430312E64666C376D6C0F4 +3000013000000042F62696E2F7465737430322E64666C376D6D0F4400001300000004 +2F62696E2F7465737430332E64666C376D6E0F45000013000000042F62696E2F74657 +37430342E64666C376D6F0F46000013000000042F62696E2F7465737430352E64666C +376D700F47000013000000042F62696E2F7465737430362E64666C376D710F4800001 +3000000042F62696E2F7465737430372E64666C376D720F49000013000000042F6269 +6E2F7465737430382E64666C376D730F4A000013000000042F62696E2F74657374303 +92E64666C376D740F4B000013000000042F62696E2F7465737431302E64666C376E6B +0F4C000013000000042F62696E2F7465737431312E64666C376E6C0F4D00001300000 +0042F62696E2F7465737431322E64666C376E6D0F4E000013000000042F62696E2F74 +65737431332E64666C376E6E0F4F000013000000042F62696E2F7465737431342E646 +66C376E6F0F50000013000000042F62696E2F7465737431352E64666C376E700F5100 +0013000000042F62696E2F7465737431362E64666C376E710F52000013000000042F6 +2696E2F7465737431372E64666C376E720F53000013000000042F62696E2F74657374 +31382E64666C376E730F54000013000000042F62696E2F7465737431392E64666C376 +E740F55000011000000042F6574632F74657374322E73683B0C08495600000D000000 +082F477053772E63686B1175D3BB'); # NOTE: # Section name is a null terminated string. # Z* is setting the memory pointer after the # terminating null byte, but in the corresponding # variable the string is stored without the null (my $section, $mem) = unpack('Z* a*', $mem); print("$section\n"); print("-" x length($section), "\n\n"); # unpack and print # load id, offset, size, # path (length of path is "offset - 4") # and crc32 of each load entry while (length($mem)) { # NOTE: # The path is a variable string. And the length # of the string is not directly before the string. # If the length of the string would be directly # before the string then I could use the # length/string technique. But in this case I have # to use two unpacks. The first is to get the length # of the string (offset_to_next_load - 4). The second # unpack then has all information to read one # repetition of data in one swoop. # get length of path my $length_of_load_path = unpack('x2 n', $mem) - 4; # get data (my $load_id, my $offset_to_next_load, my $load_size, my $load_path, my $crc32_load_path, $mem) = unpack("n n N A$length_of_load_path N a*", $mem); print("Load ID : $load_id\n"); print("Offset to next load : $offset_to_next_load\n"); print("Load Size : $load_size\n"); print("Load Path : $load_path\n"); print("CRC32 of Load Path : $crc32_load_path\n\n"); }