Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re: Seeker of Regex Wisdom (strings which don't form specific patterns)

by Laurent_R (Canon)
on Aug 12, 2015 at 20:42 UTC ( [id://1138346]=note: print w/replies, xml ) Need Help??


in reply to Seeker of Regex Wisdom (strings which don't form specific patterns)

The question is far from being clear, and I don't understand if you want to keep or not a line starting with #.

Assuming you just want to keep all the lines which have an IP address, or something very much looking an IP address, you might try something like this:

perl -ne 'print if /(\d{1,3}\.){3}\d{1,3}/;'
Of course, if you want to be more selective, you could check that the captures are smaller than 256:
perl -ne 'print if /(\d{1,3}\.){3}(\d{1,3})/ and $1 < 256 and ... and +$4 < 256;'
It really depends on your data. In many cases, the first simple regex is just sufficient, in others, you really need to be sure that you don't keep something like "345.765.5.34", which is obviously not an IP address, whatever it is.

Update: my code line above is wrong, as explained and shown below by AnomalousMonk: capture groups don't change their numbering under a counted quantifier.

It would have to be something like this:

perl -ne 'print if /(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})/ and $1 + < 256 and ... and $4 < 256;'
but that's probably getting a bit too convoluted for a one-liner.

Replies are listed 'Best First'.
Re^2: Seeker of Regex Wisdom (strings which don't form specific patterns)
by Discipulus (Canon) on Aug 13, 2015 at 08:01 UTC
    or more selective and verbosely too, as found in 'Mastering Regualr Expressions':
    ^([01]?\d\d?|2[0-4]\d|25[0-5])\.([01]?\d\d?|2[0-4]\d|25[0-5])\.([01]?\ +d\d?|2[0-4]\d|25[0-5])\.([01]?\d\d?|2[0-4]\d|25[0-5])$


    L*
    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.

      As selective, less verbose, non-repetitive, more readable, better IMHO:

          my $octet_dec = qr{ [01]?\d\d? | 2[0-4]\d | 25[0-5] }xms;

          my $ipv4_dec = qr{ $octet_dec (?: [.] $octet_dec){3} }xms;

      Update: Or better yet, as already mentioned, Regexp::Common::net.


      Give a man a fish:  <%-(-(-(-<

      Hi Discipulus,

      your suggestion is a pure regex, which makes perfect sense in J. Friedl's book, but I do not think it is more selective than my proposal, mixing a regex and some arithmetics, which looks for four dot-separated integer numbers smaller than 256. (Except that I used \d instead of [0-9] for brevity, so that my regex might match (non-Arabic) Unicode digits, but that's easily fixed.)

        perl -ne 'print if /(\d{1,3}\.){3}(\d{1,3})/ and $1 < 256 and ... and $4 < 256;'

        Unfortunately, capture groups don't change their numbering under a counted quantifier (a misapprehension I've suffered more than once), so it's necessary to use four explicit captures for the above to work:

        c:\@Work\Perl\monks>perl -wMstrict -MData::Dump -le "$_ = '12.34.56.78'; ;; /(\d{1,3}\.){3}(\d{1,3})/; dd [ $1, $2, $3, $4 ]; ;; /(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})/; dd [ $1, $2, $3, $4 ]; " ["56.", 78, undef, undef] [12, 34, 56, 78]


        Give a man a fish:  <%-(-(-(-<

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1138346]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (3)
As of 2024-03-19 07:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found