in reply to Match zero times in regex

IP address matching is a bit tricky, so it's often helpful to turn to accumulated wisdom. Regexp::Common and Regexp::Common::net can help. (To see the trickiness, change one of the 255s in the example data of the OP to 256 – or even 666 or 66666 – and see the result using the OP regex.)

>perl -wMstrict -le "use Regexp::Common qw(net); my $IPv4 = qr{ (?<! \d) $RE{net}{IPv4} (?! \d) }xms; ;; my @strs = ( 'permit ip host 10.11.1.1 192.168.100.0 0.0.0.255', 'permit ip 10.11.1.0 0.0.0.255 192.168.100.0 0.0.0.255', ); ;; for my $s (@strs) { print qq{from: '$s'}; while ($s =~ m{ (?<! host) \s+ ($IPv4) \s+ ($IPv4) }xmsg) { print qq{IP pair: '$1' '$2'}; } } " from: 'permit ip host 10.11.1.1 192.168.100.0 0.0.0.255' IP pair: '192.168.100.0' '0.0.0.255' from: 'permit ip 10.11.1.0 0.0.0.255 192.168.100.0 0.0.0.255' IP pair: '10.11.1.0' '0.0.0.255' IP pair: '192.168.100.0' '0.0.0.255'

Update: Unfortunately, this solution has a bug. See Re^2: Match zero times in regex for counter-example data demonstrating it.

Replies are listed 'Best First'.
Re^2: Match zero times in regex
by ikegami (Patriarch) on Dec 13, 2011 at 04:26 UTC
    Lookbehind was by first thought, but it doesn't work unless one changes what is currently allowed.
    from: 'permit ip host 10.11.1.1 192.168.100.0 0.0.0.255' IP pair: '192.168.100.0' '0.0.0.255' from: 'permit ip host 10.11.1.1 192.168.100.0 0.0.0.255' IP pair: '10.11.1.1' '192.168.100.0'

      From the OP:

      ... if the entry is
          permit iphost 10.11.1.1 192.168.100.0 0.0.0.255
      I want to pull out
          192.168.100.0 0.0.0.255
      but if the entry is
          permit ip10.11.1.0  0.0.0.255 192.168.100.0 0.0.0.255
      I want to pull out
      10.11.1.0  0.0.0.255
          192.168.100.0 0.0.0.255

      I don't see how an IP (10.11.1.1 in the example) after 'host' is ever desired to be captured. Am I missing something (wouldn't be the first time)?

        I don't see how an IP (10.11.1.1 in the example) after 'host' is ever desired to be captured.

        Exactly, yet I showed that your code does capture it.

Re^2: Match zero times in regex
by ricDeez (Scribe) on Dec 13, 2011 at 03:39 UTC

    The negative look-behind is what the OP was trying to emulate with the {0}. This is definitely the way to go, using negative look-behind and negative look-aheads in the regex! This is a great solution, IMHO...

    Just a query: in your regexes, why did you use the m modifier?

      ... why did you use the m modifier?

      This is in line with the recommendations of TheDamian's Perl Best Practices (PBP) for regexes. The /m regex modifier causes  ^ $ regex operators also to match after/before embedded newlines. The invariable use of /m and the /s (dot-matches-all, including newlines) modifiers reduces the number of degrees of freedom enjoyed by these operators. In turn, this reduces potential maintenance headaches (I'll show you my scars sometime) and the general brain-hurt associated with regexes.

      The PBP recommendations in general and those for regexes in particular are controversial. (See especially BrowserUk for vigorous counter-argument; also, I think, the JavaFan.) I find many, perhaps most, of the recommendations to have compelling arguments in their favor and I rigorously (dare I say blindly?) use those pertaining to regexes.

        Many thanks for your insights on this... I do have the PBP book and dust it off from time to time to make sure I am not straying from the path. I will make sure to look this recommendation up...

Re^2: Match zero times in regex
by SomeNetworkGuy (Sexton) on Dec 13, 2011 at 03:36 UTC
    Thank you. I hadn't heard of Regexp::Common::net. I'll be looking into it.