in reply to regexp match repetition breaks in Perl

A match operation will always return exactly as many values as there are captures, so /(...)(?:(...))*(...)/ will always return exactly three results on a match. When using the g modifier, the match operation will always return an exact multiple of the number of captures. You need a parser.

Here's a simple solution:

while ($text =~ m/ (APC[s]?) ( \s \d{3} (?: (?: , \s \d{3} )* \s and \s \d{3} )? ) /xg) { my ($apc, $nums) = ($1, $2); my @nums = $nums =~ /(\d+)/g; push @extracts, $apc, @nums; }

By the way, what's with pos($text) = 0 and the c switch? Removed!

Update: Added a solution.

Replies are listed 'Best First'.
Re^2: regexp match repetition breaks in Perl
by barkingdoggy (Initiate) on Jul 11, 2007 at 23:14 UTC

    Thanks to ikegami et al. for setting me on the path of righteousness!


    Re pos($text), I'm doing multiple passes through the document, picking out different things on each pass. So, I need to reset the pos before each pass. 3-digit APC codes are just one of the passes.

    I am/was extracting "APC" to help debug the perl/regex. In the final version, I just will extract the code/number.

    Thanks, again!

      So, I need to reset the pos before each pass. 3-digit APC codes are just one of the passes.

      Not quite.

      $_ = 'a1!b2.c3?'; print $1 while /([abc])/g; # abc print $1 while /([123])/g; # 123 print $1 while /([!.?])/g; # !.? print "\n";

      Note the lack of the c modifier. That does exactly the opposite of what you want. It's purpose is to prevent pos from getting reset.