Warning: this node is not for the faint of heart.

Here are two pattern matches:

"a" =~ /((a)*)*/ and printf("1:%s, 2:%s\n", defined($1) ? "'$1'" : "undef", defined($2) ? "'$2'" : "undef", ); "ab" =~ /((a)|(b))*/ and printf("1:%s, 2:%s, 3:%s\n", defined($1) ? "'$1'" : "undef", defined($2) ? "'$2'" : "undef", defined($3) ? "'$3'" : "undef", );
First guess what they store in the $DIGIT variables. Then run them and examine the results. Do you know why the capture-variables hold the strings they do? I'll post the answer later. (I'm currently in the middle of my fraternity's Initiation Week, and am busy running events.)

_____________________________________________________
Jeff[japhy]Pinyan: Perl, regex, and perl hacker, who'd like a job (NYC-area)
s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

Replies are listed 'Best First'.
(tye)Re: Interesting Regex Behavior
by tye (Sage) on Nov 22, 2002 at 16:52 UTC

    See RE: Re: ^x* vs x*$ for a previous discussion of this.

    I'd vote for this behavior to change in Perl 6 such that adjacent matches must not end at the same point (they already aren't allowed to start at the same point).

    Introducing such an improvement in Perl 5 will be harder due to backward compatability, but I wouldn't mind seeing that happen. I just haven't come up with a good design for how to make the new behavior optional.

            - tye
Re: Interesting Regex Behavior
by broquaint (Abbot) on Nov 22, 2002 at 15:26 UTC
    My amateurish guesses are as follows

    In the first pattern match because you're using the * quantifier on the first set of parens nothing is matched be cause it's a lazy quantifier and $1 gets nothing, and $2 is undefined since it isn't even executed.

    In the second pattern match the * goes to the end of the string, matches 'b', backtracks to the 'a', and the goes forward to the 'b', which respectively explains the $DIGITs.

    I suspect both guesses are pretty close to the truth, but I couldn't say for sure not being at one with the perl regex engine and all ;)
    HTH

    _________
    broquaint

      You're close, but a little off.

      In the first example, the quantifiers aren't lazy, but greedy, and THAT is why $2 is undef and $1 holds the empty string. First, the (a)* matches the "a", and stores "a" in $2, and so $1 is "a" as well. Then the outermost * makes the capturing block try again, and this time (a)* matches ZERO "a"s. Here's the trick: "y" =~ /(x)?/ stores undef in $1, and it succeeds. Therefore, $2 becomes undef, and $1 becomes the empty string.

      The second regex works thus. First, the (a) matches the "a" ($1 is "a", $2 is "a", and $3 is undef). Then the (b) matches the "b", but it does NOT reset $2's value to undef, even though (a) didn't match. Therefore, $1 is "b" (note: "japhy" =~ /(\w)+/ stores "y" in $1), $2 is "a", and $3 is "b".

      _____________________________________________________
      Jeff[japhy]Pinyan: Perl, regex, and perl hacker, who'd like a job (NYC-area)
      s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

Re: Interesting Regex Behavior
by BrowserUk (Patriarch) on Nov 22, 2002 at 16:49 UTC

    My speculation for the first one is: As the entire regex is optional (*), it matches the (implied) null string at the begining of the bound string, so the match is true and the outer capture is that null string, hence $1=''. However, the inner capture cannot be satisfied from the outer capture, so it is undef.

    For the second: Whilst the outer capture group is optional, it contains non-optional inner groupings, therefore an attempt must be made to satisfy those inner grouping before the truth of the match can be determined. As the outer capture is an or condition, and the first character in the string matches the left-most element of that or, the match is made and no further attempt is needed. At this point, the left-most inner capture group has matched 'a' so $2='a', no attempt has been made to match the right-most inner capture group, so $3=undef. The outer capture group is whatever matched inside it, so $1='a' as well.

    And I would never have guessed anywhere close until I ran the code.


    Okay you lot, get your wings on the left, halos on the right. It's one size fits all, and "No!", you can't have a different color.
    Pick up your cloud down the end and "Yes" if you get allocated a grey one they are a bit damp under foot, but someone has to get them.
    Get used to the wings fast cos its an 8 hour day...unless the Govenor calls for a cyclone or hurricane, in which case 16 hour shifts are mandatory.
    Just be grateful that you arrived just as the tornado season finished. Them buggers are real work.

Re: Interesting Regex Behavior
by dws (Chancellor) on Nov 23, 2002 at 04:30 UTC
    I'm currently in the middle of my fraternity's Initiation Week, and am busy running events.

    You're quizzing pledges on strange regex behavior? That's cruel. Most states have banned such hazing.