Getting + and * to generate multiple captures

jgeisler has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Getting + and * to generate multiple captures by ikegami (Patriarch) on Aug 17, 2006 at 17:04 UTC
Yet another completely different approach is to embed code in the regexp. `# We need use re 'eval' because we use interpolation and (?{...}) # in the same regexp. Beware of the implications of this directive. use re 'eval'; our @matches; # Don't use a lexical for this. local matches; # Protect our caller's variables. / (?{ [] }) # Create a stack $text (?: \s (\w+) (?{ [ @{$^R}, $1 ] }) # Save last match on the stack. )+ (?{ @matches = @{$^R}; }) # Success! Save the result. /x;` [download] Since Perl 5.8.0, the `$1` in the above can be replaced with `$^N`. It's possible to simplify the above code since the regexp engine will never backtrack through `(?{ [ @{$^R}, $1 ] })` in this particular regexp, but it's much safer to assume there's always the possibility of backtracking through any `(?{...})`. That's why `$^R` is used. Update: The stack is unnecessarily big in the above code. The following greatly reduces the size of the stack, which probably also speeds things up greatly. `sub flatten_list { my ($rv, $p) = @_; @$rv = (); while ($p) { unshift @$rv, $p->[1]; $p = $p->[0]; } } our @matches; local matches; / $text (?: \s (\w+) (?{ [ $^R, $1 ] }) )+ (?{ flatten_list \@matches, $^R }) /x;` [download]	[reply] [d/l] [select]
Re^2: Getting + and * to generate multiple captures by jgeisler (Initiate) on Aug 17, 2006 at 19:29 UTC
Thanks, that will do what I want. I'm assuming I need to access `@matches` explicitly after running the match to grab the values I care about? Can I do something like: `/ $text (?: \s (\w+) (?{ push @matches, $^N }) )+ /x;` [download] to just populate `@matches` instead of creating the stack and then flattening it?	[reply] [d/l] [select]
Re^3: Getting + and * to generate multiple captures by ikegami (Patriarch) on Aug 17, 2006 at 19:53 UTC
For that very specific regexp, yes. That's the simplification to which I alluded. I'll repeat the reason I didn't post the simplification It's much safer to assume there's always the possibility of backtracking through any `(?{...})`. That's why `$^R` is used. It's too easy to miss a case where backtracking can occur. For example, `/ $text (?: \s (\w+) (?{ push @matches, $^N }) ){2,} /` is wrong. `/ $text (?: \s (\w+) (?{ push @matches, $^N }) )+ ... /` is wrong. `/ $text (?: \s (\w+) (?{ push @matches, $^N }) ... )+ /` is wrong.	[reply] [d/l] [select]
Re: Getting + and * to generate multiple captures by liverpole (Monsignor) on Aug 17, 2006 at 16:55 UTC
Hi jgeisler, Have you tried split? `use strict; use warnings; my $msg = "the quick brown fox jumps over the lazy dog"; my $text = "fox"; my @words = ( ); if ($msg =~ /$text \s (((\w+) \s?)+)/x) { @words = split(/\s+/, $1); # @words now contains the list you want... printf "Words: %s\n", join(',', @words); } __END__ [Results] Words: jumps,over,the,lazy,dog` [download] s''(q.S:$/9=(T1';s;(..)(..);$..=substr+crypt($1,$2),2,3;eg;print$..$/	[reply] [d/l]
Re^2: Getting + and * to generate multiple captures by jgeisler (Initiate) on Aug 17, 2006 at 18:00 UTC
I've simplified my problem somewhat to ask the initial question. `split()` would cause me much pain because I really have multiple regular expressions with quite different separators (not the simple \s used in the example code). However, since all the regular expressions could capture the same content, I would like to only use one piece of code after the appropriate regular expression matches and returns the relevant information. In other words, I'm doing something like: `foreach my $rx (@rxs) { if (my @captures = $text =~ /$rx/) { # do something meaningful with the captures } }` [download] I'd have to have a separate part to `split()` apart each value negating much of the gain of the loop.	[reply] [d/l] [select]
Re: Getting + and * to generate multiple captures by prasadbabu (Prior) on Aug 17, 2006 at 16:17 UTC
Hi jgeisler See, you have used '+' which matches the unknown number of words exactly. But if you want to capture the matched unknown words after text, you have to use another parantheses as shown below. `use strict; use warnings; my $text = 'text'; my $str = 'text some words here'; if ($str =~ /$text \s ((?: (\w+) \s?)+)/x) { print "The words after text are :$1\n"; } prints: The words after text are :some words here` [download] Prasad	[reply] [d/l]
Re^2: Getting + and * to generate multiple captures by jgeisler (Initiate) on Aug 17, 2006 at 17:48 UTC
The problem with this is that I want each word in a separate array element. I'd have to use `split()` on this outer-capture and it turns out that this is not ideal for my bigger problem.	[reply] [d/l]