paulrh has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to unpack type, length, value data packed as follows:

1 byte as 'type', 1 bytes as 'length', length-2 bytes of data

The length field includes the first 2 bytes, so its value is 2 greater than the length of the value.. Is there a more direct way to unpack length-2 bytes into the value? in the code below, I append 2 bytes of data to the end, read 2 extra bytes, then chop them afterwards.

my $inputString = "\x03\x04Hi\x43\x08Hello!"; my %myDict = unpack("(CC/AX2)*", $inputString . "00"); chop(%myDict); chop(%myDict); print "$myDict{0x03}\n"; # "Hi" print "$myDict{0x43}\n"; # "Hello!"

Replies are listed 'Best First'.
Re: unpack less than indicated length
by hexcoder (Curate) on Jun 03, 2018 at 18:54 UTC
    Hello paulrh,

    you can do it like this:
    (a) first splitting the input string into complete records with (xCXX /A)*, (this skips the type, reads the length byte, goes back 2 bytes, and uses the length to read the record)
    (b) then for each record extract type and string map { unpack "CxA*", $_ } (this reads the type, skips the length byte, and reads the string of the record).

    use strict; use warnings; my $inputString = "\x03\x04Hi\x43\x08Hello!"; my %myDict = map { unpack "CxA*", $_ } unpack "(xCXX /A)*", $inputStri +ng; print "$myDict{0x03}\n"; # "Hi" print "$myDict{0x43}\n"; # "Hello!"

    [edit]
    better wording: chopping -> splitting
    Your code created an extra entry in the hash (type = 0x30, empty string) because of the appended "00".
    [edit2]
    Incorporated the hint from AnomalousMonk (thanks!)

      my %myDict = map { unpack "CxA*" } unpack "(xCXX /A)*", $inputString;

      NB: unpack TEMPLATE only unpacks the default scalar  $_ with Perl versions 5.10 and above. Prior to 5.10, use
          my %myDict = map { unpack "CxA*", $_ } unpack "(xCXX /A)*", $inputString;


      Give a man a fish:  <%-{-{-{-<

Re: unpack less than indicated length
by vr (Curate) on Jun 04, 2018 at 08:44 UTC

    Edit. Actually, different groups aren't necessary:

    >perl -wE "say for unpack '(CX/xX.@0x/A)*', qq(\03ab\05cdef\06ghijk)" ab cdef ghijk >perl -wE "say for unpack '(CCXX/xXX.@0xx/A)*', qq(\x03\x04Hi\x43\x08H +ello!)" 3 Hi 67 Hello!
    >perl -wE "say for unpack '(C/xX2.@0x/A)(CX2/x.@0x/A)*', qq(\03ab\05cd +ef\06ghijk)" ab cdef ghijk >perl -wE "say for unpack '(CC/xX4.@0xx/A)(CCX4/x.@0xx/A)*', qq(\x03\x +04Hi\x43\x08Hello!)" 3 Hi 67 Hello!

    Either 1st or last group is built differently, so as not to hit "X(x) outside of string in unpack".

      ++ :)

      For those who are staring at this in confusion like I did a few minutes ago, '.' returns (adds to the stack) the current position in bytes relative to the start of the current () group. This is obvious (knowing what '.' does in the pack case) now that I've seen it but I didn't understand it on my own.

      So to explain how the second template works (the left part, the right part is basically the same except instead of moving right a lot and then left a little, it moves left a little then right at lot, to avoid moving out of the string):

      If L is the length of the data (2 in the case of "Hi"), the second byte will contain the value L+2 (1 for the type byte, 1 for the length byte, and L for the data). So:

      • pos = 0
      • C reads the type byte (pos = 1)
      • C reads the length byte (L+2) (pos = 2)
      • /x removes the last value from the stack, and skips as many bytes (ie: skip L+2 bytes). (pos = 2 + L+2 = L+4)
      • X4 goes back 4 bytes (pos = L+4 - 4 = L)
      • . adds the current position (L) to the end of the stack
      • @0 returns to the start of the group (pos = 0)
      • xx skips the data byte and length byte (pos = 2)
      • /A removes the last value from the stack and reads as many (L) ASCII bytes

      One of the things that puzzled me is why the number after X was twice the number of bytes in the header. With a N bytes long header, we first move N times by reading them, then move L+N bytes because the length includes N. So we are indeed at position L+2N

        Another feature added in Perl 5.10. See perl5100delta -> "Incompatible Changes" -> "Byte/character count feature in unpack()".


        Give a man a fish:  <%-{-{-{-<

A reply falls below the community's threshold of quality. You may see it by logging in.