dru145 has asked for the wisdom of the Perl Monks concerning the following question:

Monks,

I am having trouble determing what this regular expression does:
[ split /\s+/ ] if /^(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3+})/; }
This was posted by ar0n in my node Automating Firewall Log Reporting. I "msg" ar0n for clarification, but he either didn't get it or was too busy to respond. Instead of feeling like a nuisance to him, I figure another monk would be able to help me.

This matches correctly, but I can't figure out what it is doing. The way I understand it, it is splitting on one or more whitespaces and matching anything that is NOT a digit between one to three characters long? This is where I am confused because the ip addresses range from 1-255 and I would think this does fall into the spectrum.

TIA

Dru

Replies are listed 'Best First'.
Re: Clarify this Regular Expression
by Masem (Monsignor) on Aug 17, 2001 at 00:57 UTC
    The ^ at the very start of the regex indicated the beginning of the line, as opposed negation. Thus, the regex looks for a typical numercial IP address at the start of a line; if found, it splits the line on whitespace and returns the parts as an array. Mind you, this isn't perfect (an ip of 999.999.999.999 will match though not legit), but it will make sure that any line that starts with something that looks like an IP is treated differently from everything else.

    -----------------------------------------------------
    Dr. Michael K. Neylon - mneylon-pm@masemware.com || "You've left the lens cap of your mind on again, Pinky" - The Brain

      Mastering Regular Expressions has a discussion of a regex to match IP addresses, without matching anything that's not an IP. So Masem is correct in saying that this is a crude match.

      However, I would point out that sometimes you can live with this. If you know that the file you're searching will never have something that looks like an IP address but isn't, then you can use a simpler regular expression. I have used m/^(10\.\d+\.\d+\.\d+)/ to pull IP addresses out of files, when I know that all the IP's on the LAN I'm concerned with start with 10, and none of the files I'm searching will have anything that could return a false match.

      Chumley

      Imagine a really clever .sig here.

Re: Clarify this Regular Expression
by larryk (Friar) on Aug 17, 2001 at 01:05 UTC
    just to clear this up - the whole line is...
    push @ips, [ split /\s+/ ] if /^(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})/;
    which is splitting the current line on one/more whitespaces into an anonymous array and pushing a ref to that array onto the @ips array _but_only_if_ the following regex matches an IP (crudely) at the beginning of the current line... *phew*

    I belive you are confusing \d and \D
    \d matches any digit (equiv to [0-9])
    \D matches any non-digit (equiv to [^0-9])

    hope this helps

       larryk                                          
    perl -le "s,,reverse killer,e,y,rifle,lycra,,print"
Re: Clarify this Regular Expression
by Sifmole (Chaplain) on Aug 17, 2001 at 01:08 UTC
    The pattern could also be reduced to
    /^(\d{1,3}\.){3}\d+/
    Not for the sake of golfing, but it seems to me easier to read and more obvious what it is matching.

      For efficiency you should avoid the capture of the parentheses like /^(?:\d{1,3}\.){3}\d+/ as you don't use the result. Here is a simple more accurate way to look for IPs. We just capture each of the 4 bytes into $1-$4 and can then test them however we like. Here we just see they are <256 but you could require say $1 to be 10 or whatever.

      while(<DATA>) { if (/(\d+)\.(\d+)\.(\d+)\.(\d+)/) { print "$1.$2.$3.$4 "; if ($1<256 and $2<256 and $3<256 and $4<256) { print "is IP\n"; } else { print "is not an IP\n"; } } } __DATA__ 0.0.0.0 255.255.255.255 1.2.3.4 256.1.1.1

      cheers

      tachyon

      s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

        You are very correct, there is no reason to capture the result via parentheses in my version.
Re: Clarify this Regular Expression
by traveler (Parson) on Aug 17, 2001 at 01:00 UTC
    You are very close. The ^ is the not in "character classes" like [^0-9] which matches non-digits. However, when it is not in square brackets it matches the beginning of the line, which is what ar0n's expression does.

    HTH, --traveler

Re: Clarify this Regular Expression
by Anonymous Monk on Aug 17, 2001 at 22:10 UTC

    In keeping with TMTOWTDI, the following will correctly match an IPv4 address at the start of a scalar:

    /^((\d{1,2}|1\d{2}|2&#091;0-4&#093;\d|25&#091;0-5&#093;)\.){3}\2(\s|$) +/

    And broken down (to make it look even more complicated):

    / (begin pattern match) ^ (match start of scalar) ( (begin grouping match of '0.' to '255.') ( (begin saving match of '0' to '255', to \2) \d{1,2}| (match '0' to '99' or...) 1\d{2}| (match '100' to '199' or...) 2&#091;0-4&#093;\d| (match '200' to '249' or...) 25&#091;0-5&#093; (match '250' to '255') ) (end saving match of '0' to '255', to \2) \. (match a '.') ) (end grouping match of '0.' to '255.') {3} (match '0.' to '255.' 3 times) \2 (match '0' to '255') (\s|$) (match whitespace or scalar end (when \n absent)) / (end pattern match)

    ======= BTW, that:

    perl -le "s,,reverse killer,e,y,rifle,lycra,,print"

    ...is pretty /swe{3,}t/. :)

    LaurenceHunter => "Life is loud, life is 60p/min."

Re: Clarify this Regular Expression
by Anonymous Monk on Aug 18, 2001 at 07:51 UTC

    Oops, it's better to use \Z than that final (\s|$) I had suggested (see post above).

    The MkII Match valid IPv4 at start of line (drum-roll please):

    /^((\d{1,2}|1\d{2}|2[0-4]\d|25[0-5])\.){3}\2\Z/

    LaurenceHunter => "Life is loud, life is 60p/min."