Re: positions of all regexp matches

I'm tired. My brain hurts. There must be something I've overlooked in perldoc perlvar. Can anyone explain the difference, if there is one, between merlyn's solution

push @matches, [$-[1], $+[1]] while $string =~ /(?=($regex))/g;

and this, which I came up with myself (heavily adapted, but... :):

push @matches, [$-[0], $+[1]] while $string =~ /(?=($regex))/g;

thx

dave

Comment on Re: positions of all regexp matches Select or Download Code

Replies are listed 'Best First'.
Re: Re: positions of all regexp matches by bart (Canon) on Oct 14, 2003 at 20:00 UTC
Since the matched `$1` is located at the beginning of the matched substring, `$-[0]` (left of whole match) and `$-[1]` (left of `$1`) have the same value. However, since the lookahead assertion doesn't count in the total width, the whole matched string thus has length zero, `$+[0] == $-[0]`, while `$1` inside the lookahead extends further to the right, thus `$+[1] > $-[1]`. I can imagine this explanation is a bit abstract, so I'll give a different example. Here I'm trying to match an uppercase letter that is the first of a sequence of at least 3 letters, including itself. `$_ = 'Ab1Cd2eFg3Hijk4LMN'; /((?=([a-zA-Z]{3,}))[A-Z])/ or die "No match"; print <<"END"; Whole match: \$&: '$&' $-[0] upto $+[0] Parens around everything matched: \$1: '$1' $-[1] upto $+[1] Lookahead matched: \$2: '$2' $-[2] upto $+[2] END` [download] Resulting in: Whole match: $&: 'H' 10 upto 11 Parens around everything matched: $1: 'H' 10 upto 11 Lookahead matched: $2: 'Hijk' 10 upto 14 As you can see, the parens around everything matched the same as the whole match itself. The lookahead extends beyond that, and even though you can capture what it matched, it doesn't count for the length. The first thing after the lookahead is still the same thing as the first thing of the whole match.	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re: Re: positions of all regexp matches
by bart (Canon) on Oct 14, 2003 at 20:00 UTC

$1

$-[0]

$-[1]

$1

$+[0] == $-[0]

$1

$+[1] > $-[1]

I can imagine this explanation is a bit abstract, so I'll give a different example. Here I'm trying to match an uppercase letter that is the first of a sequence of at least 3 letters, including itself.

$_ = 'Ab1Cd2eFg3Hijk4LMN';
/((?=([a-zA-Z]{3,}))[A-Z])/ or die "No match";

print <<"END";
Whole match: \$&: '$&' $-[0] upto $+[0]
Parens around everything matched: \$1: '$1' $-[1] upto $+[1]
Lookahead matched: \$2: '$2' $-[2] upto $+[2]
END
[download]

Whole match: $&: 'H' 10 upto 11
Parens around everything matched: $1: 'H' 10 upto 11
Lookahead matched: $2: 'Hijk' 10 upto 14

As you can see, the parens around everything matched the same as the whole match itself. The lookahead extends beyond that, and even though you can capture what it matched, it doesn't count for the length. The first thing after the lookahead is still the same thing as the first thing of the whole match.

[reply]
[d/l]
[select]