in reply to A bug in Perl regex(?)

See (your?) post to perl5-porters with the same report. What problems do you have with the replies you got there?

Update: Removed German-specific Google query parameter

Replies are listed 'Best First'.
Re^2: A bug in Perl regex(?)
by Serge314 (Acolyte) on Feb 19, 2011 at 07:24 UTC

    Thanks, it's my bug report. Here's my answer to Eric Brine

    Let's present the re

    'ab' =~ /((\w+)(?{print defined $2 ? "\$2=$2\n" : "\$2 not defined\n"})){2}/;

    as

    ((\w+)(?{print...}))((\w+)(?{print...}))

    Is \w{2} equivalent to \w\w, right? But we assume that the second copy of the re produces also the same $1 and $2 (not $3 and $4). Current position in the re marked with |.

    1. First (\w+) captures all the text:
    ((\w+) | (?{print...}))((\w+)(?{print...}))
    $2 receives the value 'ab', eval prints $2=ab.

    2. Then we enter second copy of (\w+):
    ((\w+)(?{print...}))(( | \w+)(?{print...}))
    $2 (and also $+, $^N, \2) receives the value undefined.

    3. We see that \w not match. We do backtracking:
    ((\w+ | )(?{print...}))((\w+)(?{print...}))
    We enter first copy of (\w+) from right to left, and $2 again receives the value undefined.

    4. (\w+) captures the letter a:
    ((\w+) | (?{print...}))((\w+)(?{print...}))
    $2 must receive the value a, but in current version of Perl $2 receives
    undefined... Why? Probably, two values of undefined are stored in $2 as in a stack,
    then last value is removed from the stack, and $2 again equal undefined?
    Here eval must print $2=a.

    5. Second copy of (\w+) captures the letter b:
    ((\w+)(?{print...}))((\w+) | (?{print...}))
    Eval prints $2=b. Match successfull.

    Do you see any mistake in this reasoning?

      Sorry for my poor English.
      After previous post I've thought once again and now I think than intuitively $2=undefined should be incorrect, and $2=a correct.

      After that I've received an email from guru of regex Jeffrey Friedl (regex.info):
      ---

      Hi Serge,
      I've been thinking about this for a while, and as far as I can tell it does seem
      to be a bug. By definition, $2 must be defined before the (?{...}) can run.
      It's probably a problem with how it backtracks. I'd suggest filing a bug report..

      ---
      Splitting the regex:
      ((\w+)(?{print...}))((\w+)(?{print...}))
      is wrong, really the regex is not split.
      After (\w+) captures all the string:
      (\w+)) | {2}
      we see, that second repetition of \w not match. We do backtracking and enter second parentheses going from right to left:
      (/w | )+
      In this case the regex engine (as I think) set $2=undefined, but why? Intuitively it seems set $2=undefined should do after we leave the open second parenthesis going from right to left.

Re^2: A bug in Perl regex(?)
by Anonymous Monk on Feb 19, 2011 at 13:20 UTC