in reply to positions of all regexp matches

I'm tired. My brain hurts. There must be something I've overlooked in perldoc perlvar. Can anyone explain the difference, if there is one, between merlyn's solution

push @matches, [$-[1], $+[1]] while $string =~ /(?=($regex))/g;

and this, which I came up with myself (heavily adapted, but... :):

push @matches, [$-[0], $+[1]] while $string =~ /(?=($regex))/g;

thx

dave

Replies are listed 'Best First'.
Re: Re: positions of all regexp matches
by bart (Canon) on Oct 14, 2003 at 20:00 UTC
    Since the matched $1 is located at the beginning of the matched substring, $-[0] (left of whole match) and $-[1] (left of $1) have the same value. However, since the lookahead assertion doesn't count in the total width, the whole matched string thus has length zero, $+[0] == $-[0], while $1 inside the lookahead extends further to the right, thus $+[1] > $-[1].

    I can imagine this explanation is a bit abstract, so I'll give a different example. Here I'm trying to match an uppercase letter that is the first of a sequence of at least 3 letters, including itself.

    $_ = 'Ab1Cd2eFg3Hijk4LMN'; /((?=([a-zA-Z]{3,}))[A-Z])/ or die "No match"; print <<"END"; Whole match: \$&: '$&' $-[0] upto $+[0] Parens around everything matched: \$1: '$1' $-[1] upto $+[1] Lookahead matched: \$2: '$2' $-[2] upto $+[2] END
    Resulting in:
    Whole match: $&: 'H' 10 upto 11
    Parens around everything matched: $1: 'H' 10 upto 11
    Lookahead matched: $2: 'Hijk' 10 upto 14
    

    As you can see, the parens around everything matched the same as the whole match itself. The lookahead extends beyond that, and even though you can capture what it matched, it doesn't count for the length. The first thing after the lookahead is still the same thing as the first thing of the whole match.