in reply to Re: Leaking Regex Captures
in thread Leaking Regex Captures

'2c3w' cannot be matched only by the second parentheses; the first parentheses must match as well, otherwise the entire match would fail. Given that both the first and the second must have matched successfully, if both $1 and $2 should "retain their values from the last successful match", then $1 should be 2, not 3.

Since /^(?:(?:(\d+(?![rw]))\s*c\s*)|(?:(\d+(?![rc]))\s*w\s*)|(?:(\d+(?![cw]))\s*r\s*))+/i; also works as expected, it seems to me that the definition of "last successful match" might be changing between runs of a repetition.

On the first pass, successful match requires the whole alternation to match before it sets the capture variable, but on subsequent repeats, only the parenthesis need to match before it changes $1?

$_ = 'bb ca de'; /(?:(.)b|.)+/i; print "Test: 1='$1', 2='$2'\n"; # Prints: Test: 1='e', 2='' # vs $_ = 'e'; /(?:(.)b|.)+/i; print "Test: 1='$1', 2='$2'\n"; # Prints: Test: 1='', 2='' # BUT! $_ = 'efg'; /(?:(.)b|.)+/i; print "Test: 1='$1', 2='$2'\n"; # Prints: Test: 1='', 2=''

This is all quite strange. The '1w1w' test shows that you don't need $1 to be set in order for it to be stomped, so I've no idea why the 'efg' didn't fail.

All I wanted was to allow users to enter their options in any order!


PS: How does one tell which capture matched, if there is garbage in the other capture variables?

Replies are listed 'Best First'.
Re^3: Leaking Regex Captures
by jwkrahn (Abbot) on Aug 04, 2009 at 16:38 UTC
    '2c3w' cannot be matched only by the second parentheses; the first parentheses must match as well, otherwise the entire match would fail.

    Incorrect.   You are using alternation so only one of the alternatives has to match for the entire match to be successful.

    Given that both the first and the second must have matched successfully,

    Using alternation only one or the other can match successfully, but not both at the same time.

    Update:

    PS: How does one tell which capture matched, if there is garbage in the other capture variables?

    From perlvar:

    One can use "$#-" to find the last matched subgroup in the last successful match.

      '2c3w' cannot be matched only by the second parentheses; the first parentheses must match as well, otherwise the entire match would fail.
      Incorrect. You are using alternation so only one of the alternatives has to match for the entire match to be successful.

      The alternation is repeated with the + so that multiple branches can match. And the regex is anchored with an '^' so in order for the '3w' to match, the '2c' must match first. Not at the same time, but they both do match on the same string.

      Adding a '$' anchor does not change the symptoms, and was left out of the example.

      Perhaps a stepwise commented example would make it clear what my issue is.

      $1 should DEFINITELY not be 'c'!
      Where did the 'a' go?


      Compare with: And this time, $1 was handled sensibly.