in reply to Re: Leaking Regex Captures
in thread Leaking Regex Captures

That's quite handy, and thanks for posting it, but sadly it does not explain why the marked branch sets the value of $1 to 'g' even though it "failed..." to match:
Matching REx "(?:(.)b|.)+" against "ebfg" 0 <> <ebfg> | 1:CURLYX[0] {1,32767}(15) 0 <> <ebfg> | 14: WHILEM[1/1](0) whilem: matched 0 out of 1..32767 0 <> <ebfg> | 3: BRANCH(11) 0 <> <ebfg> | 4: OPEN1(6) 0 <> <ebfg> | 6: REG_ANY(7) 1 <e> <bfg> | 7: CLOSE1(9) 1 <e> <bfg> | 9: EXACTF <b>(14) 2 <eb> <fg> | 14: WHILEM[1/1](0) whilem: matched 1 out of 1..32 +767 2 <eb> <fg> | 3: BRANCH(11) 2 <eb> <fg> | 4: OPEN1(6) 2 <eb> <fg> | 6: REG_ANY(7) 3 <ebf> <g> | 7: CLOSE1(9) 3 <ebf> <g> | 9: EXACTF <b>(14) failed... 2 <eb> <fg> | 11: BRANCH(13) 2 <eb> <fg> | 12: REG_ANY(14) 3 <ebf> <g> | 14: WHILEM[1/1](0) whilem: matched 2 out of 1 +..32767

3 <ebf> <g> | 3: BRANCH(11) 3 <ebf> <g> | 4: OPEN1(6) 3 <ebf> <g> | 6: REG_ANY(7) 4 <ebfg> <> | 7: CLOSE1(9) 4 <ebfg> <> | 9: EXACTF <b>(14) failed... 3 <ebf> <g> | 11: BRANCH(13)

3 <ebf> <g> | 12: REG_ANY(14) 4 <ebfg> <> | 14: WHILEM[1/1](0) whilem: matched 3 out +of 1..32767 4 <ebfg> <> | 3: BRANCH(11) 4 <ebfg> <> | 4: OPEN1(6) 4 <ebfg> <> | 6: REG_ANY(7) failed... 4 <ebfg> <> | 11: BRANCH(13) 4 <ebfg> <> | 12: REG_ANY(14) failed... BRANCH failed... whilem: failed, trying + continuat ion... 4 <ebfg> <> | 15: NOTHING(16) 4 <ebfg> <> | 16: END(0) Match successful! Test: 1='g', 2='' Freeing REx: "(?:(.)b|.)+"

Replies are listed 'Best First'.
Re^3: Leaking Regex Captures
by Anonymous Monk on Aug 05, 2009 at 12:56 UTC
    That's quite handy, and thanks for posting it, but sadly it does not explain why the marked branch sets the value of $1 to 'g' even though it "failed..." to match:

    Match successful! means it NOT fail.

      The overall regex matches. But the branch doing the capturing fails when the 'g' is in that position. The capture branch is only successful the first time, when it matches 'eb' and should capture the 'e'. Then the next times around it fails, but still captures.

      The inconsistent capturing when backtracking in alternations inside repetitions are involved. That is what does not make sense, and what I am trying to understand.

        It fails but that does not mean it still captures.   In fact it can't have captured because that branch was not taken.   What it means is that the $1 variable has invalid data in it.