in reply to Matching a regular expression group multiple times

The capturing parentheses in a regex expresson like  qr/(?:(simple).*?)+/ always capture to the same capture group no matter how many times they may be 'repeated' by a quantifier. Which capture group (by number) is determined by the position of the capturing parentheses in the final regex. After interpolation, the statement
    $string =~ /$re$re$re/g;
looks like
    $string =~ /(?:(simple).*?)+(?:(simple).*?)+(?:(simple).*?)+/g;
which clearly contains three sets of capturing parentheses, capturing to  $1 $2 $3 respectively.

Perhaps a way to do what you want is:

c:\@Work\Perl\monks>perl -wMstrict -MData::Dump -le "my $re = qr/(?:(simple).*?)+/; ;; my $string = 'This is a simple string, just a simple simple thing.'; my @captures = $string =~ /$re/g; dd \@captures; " ["simple", "simple", "simple"]

Update: This particular example can be expressed even more simply as:

c:\@Work\Perl\monks>perl -wMstrict -MData::Dump -le "my $re = qr/simple/; ;; my $string = 'This is a simple string, just a simple simple thing.'; my @captures = $string =~ /$re/g; dd \@captures; " ["simple", "simple", "simple"]

Replies are listed 'Best First'.
Re^2: Matching a regular expression group multiple times
by kennethk (Abbot) on Aug 12, 2014 at 17:37 UTC
    You are correct that the OP has confusion about number on the capture buffers, but there's something a little odd going on here with the greedy + (in my mind).
    #!/usr/bin/perl use 5.10.0; my $re = qr/(?:(simple).*?)+/; my $string = "This is a simple thing just a simple simple thing."; $string =~ /$re/g; say $&;
    outputs
    simple
    but changing line 3 to
    my $re = qr/(?:(simple).*?){3}/;
    outputs
    simple thing just a simple simple
    Why is the repeat failing? Is it because the non-greediness of the inner term somehow trumps the greediness of the outer?

    #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

      Why is the repeat failing? Is it because the non-greediness of the inner term somehow trumps the greediness of the outer?

      Yes, this is due to the way the regex engine works. Perl will match the literal string "simple", and then match any number of characters, but as few as possible (.*?), subject to the constraints imposed by the rest of the pattern. But there IS no rest of the pattern; so there are no constraints, and Perl does its utmost and matches zero extra characters.

      Only now after this is done does the + quantifier kick in, but since it finds that there isn't another literal "simple" following what was already matched, nothing further is matched, and the entire match consists of only of the initial "simple" followed by the empty string that the .*? matched.

      Wait, I hear you say, there is more to the pattern! The + itself surely follows? However, that's not how the regex engine works; the + is part of the pattern currently being matched, and the fact that it trails the non-capturing group is a mere artifact of Perl's regex syntax. It helps to think of the + as being at the front of that group instead, where you'd also find other modifiers (e.g. (?i:...)).

      So there is no pattern following the first, and Perl isn't cunning enough to match a bigger part of the string. Neither should it be: in order to do so, it'd have to ignore what you're explicitely telling it to (match any number of characters, but as few as possible), so in order to be able to match more later on. And how would it know that this is what you wanted, anyway? Perl is a DWIMmy language, but it can't read minds yet. ;)

      The regex engine's inner workings are explained in detail in chapter 5 of Programming Perl, BTW, in the section titled "The Little Engine That /Could(n't)?/".

      In the  qr/(?:(simple).*?)+/ regex,  .*? is satisfied with nothing, so it's happy. Then  (?:pattern)+ is satisfied with a single  'simple'. If there were more  simple... sequences immediately following, greedy  + would try to grab them, but there aren't, so it don't. If  + is satisfied with what it has, it can't force preceding satisfied assertions to fail.

      In the  qr/(?:(simple).*?){3}/ regex, the  {3} quantifier cannot be satisfied until it forces the preceding  .*? to grab a bunch more stuff.

      (I've removed the  /g modifier in these examples because it just confuses the issue.)

      c:\@Work\Perl\monks>perl -wMstrict -lE "my $re = qr/(?:(s \d mple).*?)+/x; my $string = 'This is a s1mple thing just a s2mple s3mple thing.'; $string =~ $re; say $&; ;; my $string2 = 'This is a s1mples2mples3mple thing'; $string2 =~ $re; say $&; ;; $re = qr/(?:(s \d mple).*?){3}/x; $string =~ $re; say $&; " s1mple s1mples2mples3mple s1mple thing just a s2mple s3mple