Re: Seeking with 'x' in unpack and out of bounds reads

unpack, unlike regexes, allows partial matches. Eg, with $str = "abc", $str =~ /(.{2})*/; will match "ab" but unpack "(a2)*", "abc" will return "ab", "c". In the case of unpack, the second iteration of the sub pattern "a2" is partial. Without 'x', you can tell that a match is partial in unpack because the output is incompatible with the pattern. In my example, you expected 2 characters but got one, so this has to be a partial match. However, 'x' doesn't have a direct effect on the output, so a partial match involving 'x' must be communicated through an error.

Note that cases like unpack "(xa)*", "abcd"; don't die because this is not a partial match, the sub pattern "xa" matches exactly 2 times, which is a valid value for *. unpack "(ax)*", "abc"; does die because there is one full match, and then a partial match of the sub pattern (a has matched but not x). unpack "(xa)*", "abc"; still doesn't die because although the match is partial, it fails on 'a' which communicates the failure by returning an empty string (which is not a valid value for "a")

This means you can always tell if there was a partial match. If the pattern matched partially and failed on a token that isn't x, you will get an output that is incompatible with the pattern (eg '.' for a2). If the pattern matched partially because it couldn't skip a byte, it will die. In all other cases, the match was complete.

Comment on Re: Seeking with 'x' in unpack and out of bounds reads Select or Download Code

Replies are listed 'Best First'.
Re^2: Seeking with 'x' in unpack and out of bounds reads by mxb (Pilgrim) on Apr 27, 2018 at 12:31 UTC
Hi, Thanks for the clear explanation. So in the circumstances where I need to seek at the start of my match, and I'm extracting as many as possible (with a `(...)*` group), what is the best approach? I could assign the results to an array within an `eval {...}` block to catch the error, but when I tried this it would still die and not return the successful matches.	[reply] [d/l] [select]
Re^3: Seeking with 'x' in unpack and out of bounds reads by Eily (Monsignor) on Apr 27, 2018 at 13:27 UTC
You could use the pattern "x4 (NN X8 N x4 /a N)". Instead of skipping the length to get it later, you would fetch it twice and use it once. And at least in that case, the pattern doesn't fail on a x (actually, since the x4 in the parentheses skips the bytes read by the second N, you know that there is something to skip. Though actually, the fact that this fails is a good thing, because you know here that for some reason, after the last chunk, there are still some bytes (between 1 and 3) that lets x4 skip at least once, but not four times in a row. IE, your data is invalid. Try unpack `"H", pack "H*", <DATA>;`. It looks like pack isn't very smart with the \n at the end of the string.	[reply] [d/l]
Re^4: Seeking with 'x' in unpack and out of bounds reads by vr (Curate) on Apr 27, 2018 at 15:36 UTC
If rogue chunk is e.g. 7 bytes long, then unpacking with the proposed template will die on "X", so wrapping into eval is required anyway if data are unreliable. However, 'x' doesn't have a direct effect on the output, so a partial match involving 'x' must be communicated through an error. Looks to me like an attempt to whitewash inconsistent Perl's behaviour :-), By similar reasoning, failure to unpack e.g. Pascal strings (as "`unpack 'C/a', qq(\03ab)`") should be fatal, I think. Side-note: PNG tags were made human-readable for a good reason, so perhaps "A4" instead of "N" (or "L") will serve better. E.g., if data are super-reliable (CRC sums to be ignored), then chunks can be read into a hash: `my ( $head, %chunks ) = unpack 'a8 (x4 A4 X8 N x4 /a x4)*', $input; say for keys %chunks;` [download]	[reply] [d/l] [select]
Re^5: Seeking with 'x' in unpack and out of bounds reads by Eily (Monsignor) on Apr 27, 2018 at 15:56 UTC
Re^3: Seeking with 'x' in unpack and out of bounds reads by BillKSmith (Monsignor) on Apr 27, 2018 at 14:56 UTC
Another approach is to preprocess your string with a regex to remove the troublesome byte(s). `$str =~ s/^((?:...)+).{0,2}$/$1/;` [download] UPDATE: Corrected typo noticed by kcott. Bill	[reply] [d/l]