in reply to Re^2: Making regex /g match continuously
in thread Making regex /g match continuously

You have a couple of options:
  1. You can do a match on the whole expression, and do the iterative (/g) match if the whole expression matches,
  2. You can accumulate the captures from the iterative match (on edit: using the /c option), and then test against /\G\Z/g before processing your way through them,
  3. You can lookahead the whole rest of the expression (I don't recommend this, because it duplicates a lot of effort compared to method 1).
# Option 1: my $item_regex = qr/ (?:int)\s+(\w+)\s*?\n (?:\s*Number\ of\ Flaps:\s(\d+)\s*?\n)? (?:\s*IP\sAddress:?\s([\d\.]+)\s*?\n)? /x; while (<DATA>) { if (/^$item_regex+\Z/) { print "$1, $2, $3\n" while (/\G$item_regex/g); } }

Caution: Contents may have been coded under pressure.

Replies are listed 'Best First'.
Re^4: Making regex /g match continuously
by scottb (Scribe) on Jan 07, 2005 at 20:23 UTC
    Thanks. I think I like the second approach best.

    If I can, a bit of an elaboration though; of the 3 approaches you suggested, which is most efficient when there is not a match? I notice that when my complicated regex's do not match that I max the CPU on my server and the HTTP aspect has to time out. If the regex matches, it takes mere fractions of a second.

    Is there a way within these approaches that I can minimize the wasted effort of the regex in a case that not all 'int's will match?

    Thanks again.

      I expect that option 1 would be the most efficient.

      Option 2 as originally described doesn't work (but see Tye's response below), because when a match fails, pos gets reset, so it won't be at the end of the string. Instead, you have to set a flag by testing pos against length within the loop:

      while (<DATA>) { my @accum; my $ate_the_whole_thing = 0; while (/\G$item_regex/g) { push @accum, [$1, $2, $3]; $ate_the_whole_thing = (pos() == length); } if ($ate_the_whole_thing) { print join ',', @$_, "\n" for @accum; } }

      Option #3 would put a lookahead in the iterative match, which is expensive (as I noted), but does work:

      while (<DATA>) { my @accum; push @accum, [$1, $2, $3] while (/\G$item_regex(?=$item_regex*\Z)/g) +; print join ',', @$_, "\n" for @accum; }

      Caution: Contents may have been coded under pressure.

        Option 2 is the clear choice for me. I guess you missed the /c option in perlop.

        - tye        

        Looks good, thanks. For the record, I went with option 1.
      Well, you can check whether pos matches length.