in reply to A bug in Perl regex(?)

This might be somewhat buggy behavior, but here is how I am interpreting the events. Because of your {2}, the pattern you are ultimately trying to match is /((\w+))((\w+))/. However, as YAPE::Regex::Explain points out,
NOTE: because you are using a quantifier on this capture, only the LAST repetition of the captured pattern will be stored in [$2]
On the first time through, \w+ grabs both letters are stores them in $2, and prints ab as expected. It then sees the repetition at the end, shifts the pointer for the second buffer to the second repetition (what would be $4 in my unrolled version). This means when the first attempt fails and you grab a on your second attempt, $2 doesn't point there anymore - it points to the second buffer in the second iteration.

I think (though this is subject to argument) that the correct behavior should not be what you claim, but should be

$2 not defined $2 not defined $2=b
since the final $2 buffer is not populated until your second iteration. In the end, it just goes to emphasize perlreftut's warning:
Be warned that this feature is considered experimental, and may be changed without notice.

Replies are listed 'Best First'.
Re^2: A bug in Perl regex(?)
by ikegami (Patriarch) on Feb 18, 2011 at 16:21 UTC
      Please expound on why you believe this should be the output. To my understanding, the regular expression specification defines output but not method. If there is a specification or archived developer discussion I am unaware of, I would appreciate the citation. Otherwise, I do not see a compelling argument for "it should be the last thing matched by those 'physical' parentheses" over my proposal.

        A successful capture should not be undef. Whether it's documented or not, $1, etc are intended to be available to (?{}) and (??{}) blocks, so they must be set ASAP.

        1. \w+ matches "ab"
        2. () sets $2
        3. Print $2
        4. Second at \w+ fails to match: backtrack
        5. \w+ matches "a"
        6. () sets $2 (This isn't happening)
        7. Print $2
        8. \w+ matches "b"
        9. () sets $2
        10. Print $2

        I do not see a compelling argument for "it should be the last thing matched by those 'physical' parentheses" over my proposal.

        If that's true, then you would expect either of

        $2=ab # Only thing matched $2=a # Only thing matched $2=a # First thing matched

        and

        $2=ab # Only thing matched $2=a # Only thing matched $2=b # Last thing matched

        and the code is still buggy as it produces neither.

        Clearer bug demonstration:

        $ perl -e'"ab" =~ /((\w+)(?{print defined $^N ? "\$^N=$^N\n" : "\$^N n +ot defined\n"})){2}/;' $^N=ab $^N not defined $^N=b

        It seems, I've mistaken. Here's my correction to my previous reasoning.

        Let's present the re

        'ab' =~ /((\w+)(?{print defined $2 ? "\$2=$2\n" : "\$2 not defined\n"})){2}/;

        as

        ((\w+)(?{print...}))((\w+)(?{print...}))

        Is \w{2} equivalent to \w\w, right? But we assume that the second copy of the re produces also the same $1 and $2 (not $3 and $4). Current position in the re marked with |.

        1. First (\w+) captures all the text:
        ((\w+) | (?{print...}))((\w+)(?{print...}))
        $2 receives the value 'ab', eval prints $2=ab.

        2. Then we enter second copy of (\w+):
        ((\w+)(?{print...}))(( | \w+)(?{print...}))
        $2 (and also $+, $^N, \2) receives the value undefined.

        3. We see that \w not match. We do backtracking:
        ((\w+ | )(?{print...}))((\w+)(?{print...}))
        We enter first copy of (\w+) from right to left, and $2 again receives the value undefined.

        4. \w+ gives back the letter b (but $2 remains undefined, because we did not come left of the opening parenthesis for $2):
        (( | \w+(?{print...}))((\w+)(?{print...}))
        $2 remains undefined.

        4. (\w+) captures none, because we did not come left of the opening parenthesis for $2:
        ((\w+) | (?{print...}))((\w+)(?{print...}))
        $2 remains undefined. Eval prints $2=undefined.

        5. Second copy of (\w+) captures the letter b:
        ((\w+)(?{print...}))((\w+) | (?{print...}))
        Eval prints $2=b. Match successfull.