capturing multiple repeated regex subparts

perl-diddler has asked for the wisdom of the Perl Monks concerning the following question:

I want to read in a line with a text field, followed by 4 floats separated by unspecified non-numerics.

I used an RE of the form:
(text);(?:(float)[non-num]){4}

This matches as expected, but I'm only getting 2 substring matches instead of getting back the 5 desired substrings. $1 is the text field, but $2 is filled with the final float substring with strings 1-3 being tossed. This isn't desirable.

I could duplicate the sub-RE that has the {4} count tag, 4 times, but that seems wasteful and less clear. Is there a way to preserve my idea matching "4"-sub-RE's while also returning the 1st 3 matches?

Seems like such a simple concept...sigh. Is this doable without nibbling at the line in a loop that picks off the trailing numerics with successive search & replace operations? TIA -l

2006-02-18 Retitled by planetscape, as per Monastery guidelines
Original title: 'simple question ?'

Comment on capturing multiple repeated regex subparts

Replies are listed 'Best First'.
Re: capturing multiple repeated regex subparts by ikegami (Patriarch) on Feb 17, 2006 at 21:01 UTC
Regexps are useful for validation, extraction and tokenizing. However, they are not as strong at parsing, as you have discovered. Parsing is nontheless possible, using advanced features. `use v5.8.0; # or higher # For $^N our @rv; our @temp_rv; / (text); (?{ local @temp_rv = ( @temp_rv, $^N ) }) (?: (float) (?{ local @temp_rv = ( @temp_rv, $^N ) }) (?:non-num) ){4} (?{ @rv = @temp_rv }) /x;` [download] Tested. `local` is needed in case of backtracking. `@rv = @temp_rv` is necessary because `@temp_rv` will wind back to its original value before the regexp exits. Package (`our`) variables (rather than lexical (`my`) variables) are needed because the code blocks in the regexp act as closures.	[reply] [d/l] [select]
Re^2: capturing multiple repeated regex subparts by japhy (Canon) on Feb 17, 2006 at 21:18 UTC
Don't eschew `$^R`! That's what it's there for: # UPDATED: added comments about what's going on our @rv; # you like taking trips? ;) m{ (text); # the (?{ ... }) block's return value # is given to $^R, the magical variable # whose value is auto-localized and gets # rolled back when backtracking occurs. # $^R's initial value, then, is an array # ref with one element, $1's value. (?{ [$1] }) (?: (float) # then, four times, we add the float we # match in $2 to the end of $^R. we # can't just do push(@{$^R}, $2), because # that would break the auto-rollback magic, # so instead, we just let the return value # set $^R again. (?{ [ @{$^R}, $2 ] }) non-num ){4} # finally, we store @{$^R} in @rv. (?{ @rv = @{$^R} }) }x; [download] Jeff `japhy` Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and `perl` hacker How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart	[reply] [d/l] [select]
Re: capturing multiple repeated regex subparts by Roy Johnson (Monsignor) on Feb 17, 2006 at 20:57 UTC
Every group to be captured has to have its own explicit set of parentheses in the regex. You can't populate $2, $3, $4, and $5 by putting a quantifier after the 2nd group. So you'll probably want to do this in two steps. I'll pretend you have a $float regex and a $non_num regex: `($text, $nums) = /(text);((?:$float$non_num){4})/; @nums = split $non_num, $nums;` [download] Caution: Contents may have been coded under pressure.	[reply] [d/l]
Re: capturing multiple repeated regex subparts by GrandFather (Saint) on Feb 17, 2006 at 20:55 UTC
You have only two sets of capture brackets so you only get two captures. Your suggestion of duplicating the counted match is probably the best answer in this case - that's what copy and paste is for in your editor :) DWIM is Perl's answer to Gödel	[reply]
Re: capturing multiple repeated regex subparts by kwaping (Priest) on Feb 18, 2006 at 00:01 UTC
How's this? `#!/usr/bin/perl use strict; use warnings; use Data::Dumper::Simple; # (text);(?:(float)[non-num]){4}; my $text = "asdf;12.34x23.45y34.56z45.67n"; my @matches = ($text =~ /([\d.]+\D)/g); print Dumper(@matches);` [download]	[reply] [d/l]
Re: capturing multiple repeated regex subparts by neilwatson (Priest) on Feb 17, 2006 at 20:49 UTC
Some real code and real sample data would be helpful. Neil Watson watson-wilson.ca	[reply]
Re: capturing multiple repeated regex subparts by Aristotle (Chancellor) on Feb 20, 2006 at 03:44 UTC
In addition to the other solutions posted, you can break up the regex using `/g` and `\G` and do the looping yourself. `my @submatch; { $str =~ /(text);/g or last; push @submatch, $1; for( 1 .. 4 ) { $str =~ /\G(?:(float)[non-num])/g or do { @submatch = (); last; } push @submatch, $1; } }` [download] Makeshifts last the longest.	[reply] [d/l]