scottb has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I have some large batches of data that I am matching with a regex and the /g option to repeat. Here is a simplified example of the data:

int A Number of Flaps: 6 IP Address: 2.2.2.2 int B Number of Flaps: 8 IP Address 5.5.5.5 unpredictable data int C Number of Flaps: 9 IP Address 9.9.9.9
I am using the following regular expression for this example. The number of 'int's can change on me.
^(?:int)\s+(\w+)\s*?\n (?:\s*Number\sof\sFlaps:\s(\d+)\s*?\n)? (?:\s*IP\sAddress:?\s([\d\.]+)\s*?\n)? (?=^int|\Z)
The problem is thus: the regex matches 'A' and 'C', not 'B', because of the unpredictable data that has sown up in 'B'. This is of course correct, but my desired behaviour is for the regex to fail because of the fact that 'B' didn't match. In other words, I want the next iteration of the regex to have to pick up where the last one left off, so that /g is not allowed to skip over sections of text it doesn't match. Is this possible? I know but I could write a full regex and not use /g, but the number of 'int's does change. I desire the best of both worlds.

Is there any way I can have my cake and eat it too?

Thanks in advance,
Scott

Replies are listed 'Best First'.
Re: Making regex /g match continuously
by Eimi Metamorphoumai (Deacon) on Jan 07, 2005 at 18:56 UTC
    It sounds like what you want is to start with \G, instead of ^. \G matches the location where the last /g match ended. (See pos and search perlop and perlre for \G for more details and examples.)
      Eimi,

      That is good info and really helps, the only remaining problem is that the following regex still succeeds in matching 'A' in the example data I provided.

      \G ^(?:int)\s+(\w+)\s*?\n (?:\s*Number\sof\sFlaps:\s(\d+)\s*?\n)? (?:\s*IP\sAddress:?\s([\d\.]+)\s*?\n)? (?=^int|\Z)
      Is there any way to pull it off where the whole regex fails that I am missing?
        You have a couple of options:
        1. You can do a match on the whole expression, and do the iterative (/g) match if the whole expression matches,
        2. You can accumulate the captures from the iterative match (on edit: using the /c option), and then test against /\G\Z/g before processing your way through them,
        3. You can lookahead the whole rest of the expression (I don't recommend this, because it duplicates a lot of effort compared to method 1).
        # Option 1: my $item_regex = qr/ (?:int)\s+(\w+)\s*?\n (?:\s*Number\ of\ Flaps:\s(\d+)\s*?\n)? (?:\s*IP\sAddress:?\s([\d\.]+)\s*?\n)? /x; while (<DATA>) { if (/^$item_regex+\Z/) { print "$1, $2, $3\n" while (/\G$item_regex/g); } }

        Caution: Contents may have been coded under pressure.
Re: Making regex /g match continuously
by Roy Johnson (Monsignor) on Jan 07, 2005 at 19:18 UTC
    As I understand your description, you want to match once, not multiple times. The expression either passes or fails, in entirety. You don't want /g. You want an anchor and a quantifier.
    ^ (?: ## added this (?:int)\s+(\w+)\s*?\n (?:\s*Number\sof\sFlaps:\s(\d+)\s*?\n)? (?:\s*IP\sAddress:?\s([\d\.]+)\s*?\n)? )+\Z ## changed your last line
    This will match one or more repetitions of your int/Flaps/Address pattern followed by the end of the string.

    Caution: Contents may have been coded under pressure.
      That would work, but the problem is that the $1, $2, $3 variables will only be set to (IIRC) the last values. So if you want to just test if the value matches, that's fine, but if you want to do something with the values, it's not as useful.