in reply to Re: Matching a regular expression group multiple times
in thread Matching a regular expression group multiple times

You are correct that the OP has confusion about number on the capture buffers, but there's something a little odd going on here with the greedy + (in my mind).
#!/usr/bin/perl use 5.10.0; my $re = qr/(?:(simple).*?)+/; my $string = "This is a simple thing just a simple simple thing."; $string =~ /$re/g; say $&;
outputs
simple
but changing line 3 to
my $re = qr/(?:(simple).*?){3}/;
outputs
simple thing just a simple simple
Why is the repeat failing? Is it because the non-greediness of the inner term somehow trumps the greediness of the outer?

#11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

Replies are listed 'Best First'.
Re^3: Matching a regular expression group multiple times
by AppleFritter (Vicar) on Aug 12, 2014 at 18:09 UTC

    Why is the repeat failing? Is it because the non-greediness of the inner term somehow trumps the greediness of the outer?

    Yes, this is due to the way the regex engine works. Perl will match the literal string "simple", and then match any number of characters, but as few as possible (.*?), subject to the constraints imposed by the rest of the pattern. But there IS no rest of the pattern; so there are no constraints, and Perl does its utmost and matches zero extra characters.

    Only now after this is done does the + quantifier kick in, but since it finds that there isn't another literal "simple" following what was already matched, nothing further is matched, and the entire match consists of only of the initial "simple" followed by the empty string that the .*? matched.

    Wait, I hear you say, there is more to the pattern! The + itself surely follows? However, that's not how the regex engine works; the + is part of the pattern currently being matched, and the fact that it trails the non-capturing group is a mere artifact of Perl's regex syntax. It helps to think of the + as being at the front of that group instead, where you'd also find other modifiers (e.g. (?i:...)).

    So there is no pattern following the first, and Perl isn't cunning enough to match a bigger part of the string. Neither should it be: in order to do so, it'd have to ignore what you're explicitely telling it to (match any number of characters, but as few as possible), so in order to be able to match more later on. And how would it know that this is what you wanted, anyway? Perl is a DWIMmy language, but it can't read minds yet. ;)

    The regex engine's inner workings are explained in detail in chapter 5 of Programming Perl, BTW, in the section titled "The Little Engine That /Could(n't)?/".

Re^3: Matching a regular expression group multiple times
by AnomalousMonk (Archbishop) on Aug 12, 2014 at 18:20 UTC

    In the  qr/(?:(simple).*?)+/ regex,  .*? is satisfied with nothing, so it's happy. Then  (?:pattern)+ is satisfied with a single  'simple'. If there were more  simple... sequences immediately following, greedy  + would try to grab them, but there aren't, so it don't. If  + is satisfied with what it has, it can't force preceding satisfied assertions to fail.

    In the  qr/(?:(simple).*?){3}/ regex, the  {3} quantifier cannot be satisfied until it forces the preceding  .*? to grab a bunch more stuff.

    (I've removed the  /g modifier in these examples because it just confuses the issue.)

    c:\@Work\Perl\monks>perl -wMstrict -lE "my $re = qr/(?:(s \d mple).*?)+/x; my $string = 'This is a s1mple thing just a s2mple s3mple thing.'; $string =~ $re; say $&; ;; my $string2 = 'This is a s1mples2mples3mple thing'; $string2 =~ $re; say $&; ;; $re = qr/(?:(s \d mple).*?){3}/x; $string =~ $re; say $&; " s1mple s1mples2mples3mple s1mple thing just a s2mple s3mple