Re: Getting + and * to generate multiple captures

Yet another completely different approach is to embed code in the regexp.

# We need use re 'eval' because we use interpolation and (?{...})
# in the same regexp. Beware of the implications of this directive.
use re 'eval';

our @matches;                 # Don't use a lexical for this.
local *matches;               # Protect our caller's variables.
/
   (?{ [] })                  # Create a stack
   $text
   (?:
      \s
      (\w+)
      (?{ [ @{$^R}, $1 ] })   # Save last match on the stack.
   )+
   (?{ @matches = @{$^R}; })  # Success! Save the result.
/x;
[download]

Since Perl 5.8.0, the $1 in the above can be replaced with $^N.

It's possible to simplify the above code since the regexp engine will never backtrack through (?{ [ @{$^R}, $1 ] }) in this particular regexp, but it's much safer to assume there's always the possibility of backtracking through any (?{...}). That's why $^R is used.

Update: The stack is unnecessarily big in the above code. The following greatly reduces the size of the stack, which probably also speeds things up greatly.

sub flatten_list {
   my ($rv, $p) = @_;
   @$rv = ();
   while ($p) {
      unshift @$rv, $p->[1];
      $p = $p->[0];
   }
}

our @matches;
local *matches;
/
   $text
   (?:
      \s
      (\w+)
      (?{ [ $^R, $1 ] })
   )+
   (?{ flatten_list \@matches, $^R })
/x;
[download]

Comment on Re: Getting + and * to generate multiple captures Select or Download Code

Replies are listed 'Best First'.
Re^2: Getting + and * to generate multiple captures by jgeisler (Initiate) on Aug 17, 2006 at 19:29 UTC
Thanks, that will do what I want. I'm assuming I need to access `@matches` explicitly after running the match to grab the values I care about? Can I do something like: `/ $text (?: \s (\w+) (?{ push @matches, $^N }) )+ /x;` [download] to just populate `@matches` instead of creating the stack and then flattening it?	[reply] [d/l] [select]
Re^3: Getting + and * to generate multiple captures by ikegami (Patriarch) on Aug 17, 2006 at 19:53 UTC
For that very specific regexp, yes. That's the simplification to which I alluded. I'll repeat the reason I didn't post the simplification It's much safer to assume there's always the possibility of backtracking through any `(?{...})`. That's why `$^R` is used. It's too easy to miss a case where backtracking can occur. For example, `/ $text (?: \s (\w+) (?{ push @matches, $^N }) ){2,} /` is wrong. `/ $text (?: \s (\w+) (?{ push @matches, $^N }) )+ ... /` is wrong. `/ $text (?: \s (\w+) (?{ push @matches, $^N }) ... )+ /` is wrong.	[reply] [d/l] [select]