unpack less than indicated length

paulrh has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: unpack less than indicated length by hexcoder (Curate) on Jun 03, 2018 at 18:54 UTC
Hello paulrh, you can do it like this: (a) first splitting the input string into complete records with `(xCXX /A)`, (this skips the type, reads the length byte, goes back 2 bytes, and uses the length to read the record) (b) then for each record extract type and string `map { unpack "CxA", $_ }` (this reads the type, skips the length byte, and reads the string of the record). `use strict; use warnings; my $inputString = "\x03\x04Hi\x43\x08Hello!"; my %myDict = map { unpack "CxA", $_ } unpack "(xCXX /A)", $inputStri +ng; print "$myDict{0x03}\n"; # "Hi" print "$myDict{0x43}\n"; # "Hello!"` [download] [edit] better wording: chopping -> splitting Your code created an extra entry in the hash (type = 0x30, empty string) because of the appended "00". [edit2] Incorporated the hint from AnomalousMonk (thanks!)	[reply] [d/l] [select]
Re^2: unpack less than indicated length by AnomalousMonk (Archbishop) on Jun 03, 2018 at 19:45 UTC
`my %myDict = map { unpack "CxA" } unpack "(xCXX /A)", $inputString;` NB: `unpack` `TEMPLATE` only unpacks the default scalar `$_` with Perl versions 5.10 and above. Prior to 5.10, use `my %myDict = map { unpack "CxA", $_ } unpack "(xCXX /A)", $inputString;` Give a man a fish: `<%-{-{-{-<`	[reply] [d/l] [select]
Re: unpack less than indicated length by vr (Curate) on Jun 04, 2018 at 08:44 UTC
Edit. Actually, different groups aren't necessary: `>perl -wE "say for unpack '(CX/xX.@0x/A)', qq(\03ab\05cdef\06ghijk)" ab cdef ghijk >perl -wE "say for unpack '(CCXX/xXX.@0xx/A)', qq(\x03\x04Hi\x43\x08H +ello!)" 3 Hi 67 Hello!` [download] `>perl -wE "say for unpack '(C/xX2.@0x/A)(CX2/x.@0x/A)', qq(\03ab\05cd +ef\06ghijk)" ab cdef ghijk >perl -wE "say for unpack '(CC/xX4.@0xx/A)(CCX4/x.@0xx/A)', qq(\x03\x +04Hi\x43\x08Hello!)" 3 Hi 67 Hello!` [download] Either 1st or last group is built differently, so as not to hit "X(x) outside of string in unpack".	[reply] [d/l] [select]
Re^2: unpack less than indicated length by Eily (Monsignor) on Jun 04, 2018 at 10:10 UTC
++ :) For those who are staring at this in confusion like I did a few minutes ago, '.' returns (adds to the stack) the current position in bytes relative to the start of the current () group. This is obvious (knowing what '.' does in the pack case) now that I've seen it but I didn't understand it on my own. So to explain how the second template works (the left part, the right part is basically the same except instead of moving right a lot and then left a little, it moves left a little then right at lot, to avoid moving out of the string): If L is the length of the data (2 in the case of "Hi"), the second byte will contain the value L+2 (1 for the type byte, 1 for the length byte, and L for the data). So: pos = 0 C reads the type byte (pos = 1) C reads the length byte (L+2) (pos = 2) /x removes the last value from the stack, and skips as many bytes (ie: skip L+2 bytes). (pos = 2 + L+2 = L+4) X4 goes back 4 bytes (pos = L+4 - 4 = L) . adds the current position (L) to the end of the stack @0 returns to the start of the group (pos = 0) xx skips the data byte and length byte (pos = 2) /A removes the last value from the stack and reads as many (L) ASCII bytes One of the things that puzzled me is why the number after X was twice the number of bytes in the header. With a N bytes long header, we first move N times by reading them, then move L+N bytes because the length includes N. So we are indeed at position L+2N	[reply]
Re^3: unpack less than indicated length by AnomalousMonk (Archbishop) on Jun 04, 2018 at 14:36 UTC
Another feature added in Perl 5.10. See perl5100delta -> "Incompatible Changes" -> "Byte/character count feature in unpack()". Give a man a fish: `<%-{-{-{-<`	[reply] [d/l]
A reply falls below the community's threshold of quality. You may see it by logging in.