Clarify this Regular Expression

dru145 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Clarify this Regular Expression by Masem (Monsignor) on Aug 17, 2001 at 00:57 UTC
The ^ at the very start of the regex indicated the beginning of the line, as opposed negation. Thus, the regex looks for a typical numercial IP address at the start of a line; if found, it splits the line on whitespace and returns the parts as an array. Mind you, this isn't perfect (an ip of 999.999.999.999 will match though not legit), but it will make sure that any line that starts with something that looks like an IP is treated differently from everything else. ----------------------------------------------------- Dr. Michael K. Neylon - mneylon-pm@masemware.com \|\| "You've left the lens cap of your mind on again, Pinky" - The Brain	[reply]
Re: Re: Clarify this Regular Expression by chumley (Sexton) on Aug 17, 2001 at 02:34 UTC
Mastering Regular Expressions has a discussion of a regex to match IP addresses, without matching anything that's not an IP. So Masem is correct in saying that this is a crude match. However, I would point out that sometimes you can live with this. If you know that the file you're searching will never have something that looks like an IP address but isn't, then you can use a simpler regular expression. I have used `m/^(10\.\d+\.\d+\.\d+)/` to pull IP addresses out of files, when I know that all the IP's on the LAN I'm concerned with start with 10, and none of the files I'm searching will have anything that could return a false match. Chumley Imagine a really clever .sig here.	[reply] [d/l]
Re: Clarify this Regular Expression by larryk (Friar) on Aug 17, 2001 at 01:05 UTC
just to clear this up - the whole line is... `push @ips, [ split /\s+/ ] if /^(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})/;` [download] which is splitting the current line on one/more whitespaces into an anonymous array and pushing a ref to that array onto the `@ips` array _but_only_if_ the following regex matches an IP (crudely) at the beginning of the current line... phew I belive you are confusing `\d` and `\D` `\d` matches any digit (equiv to `[0-9]`) `\D` matches any non-digit (equiv to `[^0-9]`) hope this helps larryk perl -le "s,,reverse killer,e,y,rifle,lycra,,print"	[reply] [d/l] [select]
Re: Clarify this Regular Expression by Sifmole (Chaplain) on Aug 17, 2001 at 01:08 UTC
The pattern could also be reduced to `/^(\d{1,3}\.){3}\d+/` [download] Not for the sake of golfing, but it seems to me easier to read and more obvious what it is matching.	[reply] [d/l]
Re: Re: Clarify this Regular Expression by tachyon (Chancellor) on Aug 17, 2001 at 03:34 UTC
For efficiency you should avoid the capture of the parentheses like `/^(?:\d{1,3}\.){3}\d+/` as you don't use the result. Here is a simple more accurate way to look for IPs. We just capture each of the 4 bytes into $1-$4 and can then test them however we like. Here we just see they are <256 but you could require say $1 to be 10 or whatever. `while(<DATA>) { if (/(\d+)\.(\d+)\.(\d+)\.(\d+)/) { print "$1.$2.$3.$4 "; if ($1<256 and $2<256 and $3<256 and $4<256) { print "is IP\n"; } else { print "is not an IP\n"; } } } __DATA__ 0.0.0.0 255.255.255.255 1.2.3.4 256.1.1.1` [download] cheers tachyon s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print	[reply] [d/l]
Re: Re: Re: Clarify this Regular Expression by Sifmole (Chaplain) on Aug 17, 2001 at 16:41 UTC
You are very correct, there is no reason to capture the result via parentheses in my version.	[reply]
Re: Clarify this Regular Expression by traveler (Parson) on Aug 17, 2001 at 01:00 UTC
You are very close. The ^ is the not in "character classes" like `[^0-9]` which matches non-digits. However, when it is not in square brackets it matches the beginning of the line, which is what ar0n's expression does. HTH, --traveler	[reply] [d/l]
Re: Clarify this Regular Expression by Anonymous Monk on Aug 17, 2001 at 22:10 UTC
In keeping with TMTOWTDI, the following will correctly match an IPv4 address at the start of a scalar: `/^((\d{1,2}\|1\d{2}\|2[0-4]\d\|25[0-5])\.){3}\2(\s\|$) +/` [download] And broken down (to make it look even more complicated): / (begin pattern match) ^ (match start of scalar) ( (begin grouping match of '0.' to '255.') ( (begin saving match of '0' to '255', to \2) \d{1,2}\| (match '0' to '99' or...) 1\d{2}\| (match '100' to '199' or...) 2[0-4]\d\| (match '200' to '249' or...) 25[0-5] (match '250' to '255') ) (end saving match of '0' to '255', to \2) \. (match a '.') ) (end grouping match of '0.' to '255.') {3} (match '0.' to '255.' 3 times) \2 (match '0' to '255') (\s\|$) (match whitespace or scalar end (when \n absent)) / (end pattern match) [download] ======= BTW, that: `perl -le "s,,reverse killer,e,y,rifle,lycra,,print"` ...is pretty `/swe{3,}t/`. :) LaurenceHunter => "Life is loud, life is 60p/min."	[reply] [d/l] [select]
Re: Clarify this Regular Expression by Anonymous Monk on Aug 18, 2001 at 07:51 UTC
Oops, it's better to use `\Z` than that final `(\s\|$)` I had suggested (see post above). The MkII Match valid IPv4 at start of line (drum-roll please): /^((\d{1,2}\|1\d{2}\|2[0-4]\d\|25[0-5])\.){3}\2\Z/ LaurenceHunter `=>` "Life is loud, life is 60p/min."	[reply]