in reply to Unexpected regular expression match

You seem to think backtracking cannot occur within (?>PAT). That's not the case at all.

The purpose of (?>PAT) is to prevent the regex engine from trying to get PAT to match something else once it has already matched something. In short, backtracking through (?>PAT) causes it to fail.

You want

/^\d+ _\d+ (?> (?:_\d+)? ) \w+ $/x

which can also be written as

/^\d+ _\d+ (?:_\d+)?+ \w+ $/x

For "1_0_1" =~ /^\d+ _\d+ (?> (?:_\d+)? ) \w+ $/x, everything is straightforward until /\w+/ fails to match. At that point, the regex engine starts to backtrack.

  1. Backtracking through causes it (?>...) fail (as always).
  2. Backtracking through causes it \d+ fail (since it previously only matched only one digit).
  3. Backtracking through causes it _ fail (as always).
  4. Backtracking through causes it \d+ fail (since it previously only matched only one digit).
  5. Backtracking through causes it ^ fail (as always).
  6. The match fails.

Replies are listed 'Best First'.
Re^2: Unexpected regular expression match
by ikegami (Patriarch) on Jan 26, 2012 at 00:43 UTC

    Once one backtracks through (?>PAT), the regex engine is free to try to match PAT at a different location (or maybe even at the same location) if backtracking ended successfully earlier in the pattern.

    This causes my proposed solution to fail for "2_34_5".

    In theme, this can be fixed by preventing backtracking through the early \d+.

    /^\d++ _ \d++ (?:_ \d+)?+ \w+ $/x

    One can also solve this without any (?>...) at all by being more precise with the definitions.

    /^\d+ _ \d+ (?:_ \d+)? (?![\d_])\w+ $/x
Re^2: Unexpected regular expression match
by GrandFather (Saint) on Jan 26, 2012 at 02:34 UTC

    Thank you, that makes sense. I did indeed think backtracking could not occur within (?>PAT). This may be the first time I've tried to use (?>PAT) and I can't say that reading the documentation really helped understand where backtracking was being suppressed.

    The quantifier+ (possessive quantifier) syntax is new to me. Any idea when it was introduced (5.10 maybe)? The phrase "give nothing back" in the documentation makes the possessive quantifier (and by implication (?>PAT) ) much easier to understand in my view.

    True laziness is hard work

      I'll make sure to use "give nothing back" in the future.

      5.10.1 did have it, so yeah, it was surely introduced in 5.10.0