in reply to unpack less than indicated length

Edit. Actually, different groups aren't necessary:

>perl -wE "say for unpack '(CX/xX.@0x/A)*', qq(\03ab\05cdef\06ghijk)" ab cdef ghijk >perl -wE "say for unpack '(CCXX/xXX.@0xx/A)*', qq(\x03\x04Hi\x43\x08H +ello!)" 3 Hi 67 Hello!
>perl -wE "say for unpack '(C/xX2.@0x/A)(CX2/x.@0x/A)*', qq(\03ab\05cd +ef\06ghijk)" ab cdef ghijk >perl -wE "say for unpack '(CC/xX4.@0xx/A)(CCX4/x.@0xx/A)*', qq(\x03\x +04Hi\x43\x08Hello!)" 3 Hi 67 Hello!

Either 1st or last group is built differently, so as not to hit "X(x) outside of string in unpack".

Replies are listed 'Best First'.
Re^2: unpack less than indicated length
by Eily (Monsignor) on Jun 04, 2018 at 10:10 UTC

    ++ :)

    For those who are staring at this in confusion like I did a few minutes ago, '.' returns (adds to the stack) the current position in bytes relative to the start of the current () group. This is obvious (knowing what '.' does in the pack case) now that I've seen it but I didn't understand it on my own.

    So to explain how the second template works (the left part, the right part is basically the same except instead of moving right a lot and then left a little, it moves left a little then right at lot, to avoid moving out of the string):

    If L is the length of the data (2 in the case of "Hi"), the second byte will contain the value L+2 (1 for the type byte, 1 for the length byte, and L for the data). So:

    • pos = 0
    • C reads the type byte (pos = 1)
    • C reads the length byte (L+2) (pos = 2)
    • /x removes the last value from the stack, and skips as many bytes (ie: skip L+2 bytes). (pos = 2 + L+2 = L+4)
    • X4 goes back 4 bytes (pos = L+4 - 4 = L)
    • . adds the current position (L) to the end of the stack
    • @0 returns to the start of the group (pos = 0)
    • xx skips the data byte and length byte (pos = 2)
    • /A removes the last value from the stack and reads as many (L) ASCII bytes

    One of the things that puzzled me is why the number after X was twice the number of bytes in the header. With a N bytes long header, we first move N times by reading them, then move L+N bytes because the length includes N. So we are indeed at position L+2N

      Another feature added in Perl 5.10. See perl5100delta -> "Incompatible Changes" -> "Byte/character count feature in unpack()".


      Give a man a fish:  <%-{-{-{-<