SomeNetworkGuy has asked for the wisdom of the Perl Monks concerning the following question:

I want to be able to extract the '192.168.100.0 0.0.0.255' out of the below text.

my $entry = 'permit ip host 10.11.1.1 192.168.100.0 0.0.0.255'; while ($entry =~ /(?:host){0}\s+(\d+\.\d+\.\d+\.\d+)\s+(\d+\.\d+\.\d+\ +.\d+)/g){ # do something with $1 and $2 }

I need to find every occurrence of these two network and mask combinations. So basically if the entry is

permit ip host 10.11.1.1 192.168.100.0 0.0.0.255

I want to pull out

192.168.100.0 0.0.0.255

but if the entry is

permit ip 10.11.1.0  0.0.0.255 192.168.100.0 0.0.0.255

I want to pull out

10.11.1.0 0.0.0.255 192.168.100.0 0.0.0.255

Why doesn't {0} match zero times like I expect it to?

Replies are listed 'Best First'.
Re: Match zero times in regex
by AnomalousMonk (Archbishop) on Dec 13, 2011 at 02:08 UTC

    IP address matching is a bit tricky, so it's often helpful to turn to accumulated wisdom. Regexp::Common and Regexp::Common::net can help. (To see the trickiness, change one of the 255s in the example data of the OP to 256 – or even 666 or 66666 – and see the result using the OP regex.)

    >perl -wMstrict -le "use Regexp::Common qw(net); my $IPv4 = qr{ (?<! \d) $RE{net}{IPv4} (?! \d) }xms; ;; my @strs = ( 'permit ip host 10.11.1.1 192.168.100.0 0.0.0.255', 'permit ip 10.11.1.0 0.0.0.255 192.168.100.0 0.0.0.255', ); ;; for my $s (@strs) { print qq{from: '$s'}; while ($s =~ m{ (?<! host) \s+ ($IPv4) \s+ ($IPv4) }xmsg) { print qq{IP pair: '$1' '$2'}; } } " from: 'permit ip host 10.11.1.1 192.168.100.0 0.0.0.255' IP pair: '192.168.100.0' '0.0.0.255' from: 'permit ip 10.11.1.0 0.0.0.255 192.168.100.0 0.0.0.255' IP pair: '10.11.1.0' '0.0.0.255' IP pair: '192.168.100.0' '0.0.0.255'

    Update: Unfortunately, this solution has a bug. See Re^2: Match zero times in regex for counter-example data demonstrating it.

      Lookbehind was by first thought, but it doesn't work unless one changes what is currently allowed.
      from: 'permit ip host 10.11.1.1 192.168.100.0 0.0.0.255' IP pair: '192.168.100.0' '0.0.0.255' from: 'permit ip host 10.11.1.1 192.168.100.0 0.0.0.255' IP pair: '10.11.1.1' '192.168.100.0'

        From the OP:

        ... if the entry is
            permit iphost 10.11.1.1 192.168.100.0 0.0.0.255
        I want to pull out
            192.168.100.0 0.0.0.255
        but if the entry is
            permit ip10.11.1.0  0.0.0.255 192.168.100.0 0.0.0.255
        I want to pull out
        10.11.1.0  0.0.0.255
            192.168.100.0 0.0.0.255

        I don't see how an IP (10.11.1.1 in the example) after 'host' is ever desired to be captured. Am I missing something (wouldn't be the first time)?

      The negative look-behind is what the OP was trying to emulate with the {0}. This is definitely the way to go, using negative look-behind and negative look-aheads in the regex! This is a great solution, IMHO...

      Just a query: in your regexes, why did you use the m modifier?

        ... why did you use the m modifier?

        This is in line with the recommendations of TheDamian's Perl Best Practices (PBP) for regexes. The /m regex modifier causes  ^ $ regex operators also to match after/before embedded newlines. The invariable use of /m and the /s (dot-matches-all, including newlines) modifiers reduces the number of degrees of freedom enjoyed by these operators. In turn, this reduces potential maintenance headaches (I'll show you my scars sometime) and the general brain-hurt associated with regexes.

        The PBP recommendations in general and those for regexes in particular are controversial. (See especially BrowserUk for vigorous counter-argument; also, I think, the JavaFan.) I find many, perhaps most, of the recommendations to have compelling arguments in their favor and I rigorously (dare I say blindly?) use those pertaining to regexes.

      Thank you. I hadn't heard of Regexp::Common::net. I'll be looking into it.
Re: Match zero times in regex
by ikegami (Patriarch) on Dec 13, 2011 at 00:30 UTC

    It does match zero times. In your code, (?:host){0} is matching "host" zero times starting at position 14.

    1 2 3 4 012345678901234567890123456789012345678901234567 permit ip host 10.11.1.1 192.168.100.0 0.0.0.255

    What about

    if ( my ($pairs) = $entry =~ /^ \s* permit \s+ ip \s+ (?: host \s+ \S+ \s+ )? (.*)/x ) { while ( $pairs =~ /(\S+) \s+ (\S+)/xg ) { my ($ip, $mask) = ($1, $2); ... $ip ... $mask ... } }

    Update: Fixed problem with solution.

        Thanks, fixed. I hate having to use $1 and $2, but forgot that it's required here.

        However, the linked node has no bearing on my mistake. The linked node is about the result of list assignment in scalar context, while my mistake was desiring the scalar context behaviour of m//g while calling it in list context.

        PS — Mini-Tutorial: Scalar vs List Assignment Operator is much more comprehensive about the behaviour of assignment based on context.

      Thanks, I see now why my regex wasn't working the way I wanted it to.
Re: Match zero times in regex
by JavaFan (Canon) on Dec 13, 2011 at 00:21 UTC
    Works for me:
    $_ = "permit ip 10.11.1.0 0.0.0.255 192.168.100.0 0.0.0.255"; while (/(?:host){0}\s+(\d+\.\d+\.\d+\.\d+)\s+(\d+\.\d+\.\d+\.\d+)/g){ say "$1 $2"; } __END__ 10.11.1.0 0.0.0.255 192.168.100.0 0.0.0.255
    I copy-and-pasted your regexp.

    I don't why you'd use (?:host){0} though, it doesn't add anything.

      Now try the other example he gave.
Re: Match zero times in regex
by vinian (Beadle) on Dec 13, 2011 at 00:35 UTC

    maybe this is what you want (?:host){0,1} or (?:host)? both match zero or one time

    Life is all about making decisions. Stop or go, shake or bake, plea bargain or go to trial... without the ability to make decisions, nothing would ever get done.
Re: Match zero times in regex
by TJPride (Pilgrim) on Dec 13, 2011 at 09:41 UTC
    use strict; use warnings; my $mask = '\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}'; while (<DATA>) { chomp; if (@_ = m/permit ip(?: host)?\s+($mask)\s+($mask)\s+($mask)(?:\s+ +($mask))?$/) { no warnings 'uninitialized'; if ($2 eq '192.168.100.0' && $3 eq '0.0.0.255' || $3 eq '192.168.100.0' && $4 eq '0.0.0.255') { print "@_\n"; } } } __DATA__ permit ip host 10.11.1.1 192.168.100.0 0.0.0.255 permit ip 10.11.1.0 0.0.0.255 192.168.100.0 0.0.0.255