in reply to Getting + and * to generate multiple captures

Yet another completely different approach is to embed code in the regexp.

# We need use re 'eval' because we use interpolation and (?{...}) # in the same regexp. Beware of the implications of this directive. use re 'eval'; our @matches; # Don't use a lexical for this. local *matches; # Protect our caller's variables. / (?{ [] }) # Create a stack $text (?: \s (\w+) (?{ [ @{$^R}, $1 ] }) # Save last match on the stack. )+ (?{ @matches = @{$^R}; }) # Success! Save the result. /x;

Since Perl 5.8.0, the $1 in the above can be replaced with $^N.

It's possible to simplify the above code since the regexp engine will never backtrack through (?{ [ @{$^R}, $1 ] }) in this particular regexp, but it's much safer to assume there's always the possibility of backtracking through any (?{...}). That's why $^R is used.

Update: The stack is unnecessarily big in the above code. The following greatly reduces the size of the stack, which probably also speeds things up greatly.

sub flatten_list { my ($rv, $p) = @_; @$rv = (); while ($p) { unshift @$rv, $p->[1]; $p = $p->[0]; } } our @matches; local *matches; / $text (?: \s (\w+) (?{ [ $^R, $1 ] }) )+ (?{ flatten_list \@matches, $^R }) /x;

Replies are listed 'Best First'.
Re^2: Getting + and * to generate multiple captures
by jgeisler (Initiate) on Aug 17, 2006 at 19:29 UTC
    Thanks, that will do what I want. I'm assuming I need to access @matches explicitly after running the match to grab the values I care about? Can I do something like:
    / $text (?: \s (\w+) (?{ push @matches, $^N }) )+ /x;
    to just populate @matches instead of creating the stack and then flattening it?

      For that very specific regexp, yes. That's the simplification to which I alluded. I'll repeat the reason I didn't post the simplification

      It's much safer to assume there's always the possibility of backtracking through any (?{...}). That's why $^R is used.

      It's too easy to miss a case where backtracking can occur.

      For example,
      / $text (?: \s (\w+) (?{ push @matches, $^N }) ){2,} / is wrong.
      / $text (?: \s (\w+) (?{ push @matches, $^N }) )+ ... / is wrong.
      / $text (?: \s (\w+) (?{ push @matches, $^N }) ... )+ / is wrong.