ccrash has asked for the wisdom of the Perl Monks concerning the following question:

This node falls below the community's minimum standard of quality and will not be displayed.

Replies are listed 'Best First'.
Re: Need advice on PERL
by kabeldag (Hermit) on Dec 12, 2006 at 08:59 UTC
    - ^ is match at beginning ..
    - \d is looking for a digit character
    - + is find \d 1 or more times
    - (...) Groups subexpressions for capturing to $1, $2,$3 ...

    In this case, $scrip will return the the complete sub expression match (which is inside the brackets : (\d+\.\d+\.\d+\.\d+) ):
    my ($scrip) = $ARGV[0] =~ /^(\d+\.\d+\.\d+\.\d+)/; print "\$scrip returned : $scrip\n"; print "match 1 : $1\nmatch 2 : $2\nmatch 3 : $3\nmatch 4 : $4\n";
    INPUT
    C:\Perl\bin>perl bleh.pl 1.0.3.3_1.3.45.44

    OUTPUT
    $scrip returned : 1.0.3.3
    match 1 : 1.0.3.3
    match 2 :
    match 3 :
    match 4 :

    If "\d+\.\d+\.\d+\.\d+" wasn't inside the brackets, $scrip would return 1, if it matched the reg-ex:
    my ($scrip) = $ARGV[0] =~ /\d+\.\d+\.\d+\.\d+/; if($scrip) { print "matched $ARGV[0] !\n"; }else{ print "did not match $ARGV[0] !\n"; } print "\$scrip = $scrip\n";
    OUTPUT EXAMPLES:

    C:\Perl\bin>perl bleh.pl 13.3.3.3_
    matched 13.3.3.3_ !
    $scrip = 1

    C:\Perl\bin>perl bleh.pl 13.3.
    did not match 13.3. !
    $scrip =
Re: Need advice on PERL
by lin0 (Curate) on Dec 12, 2006 at 13:33 UTC

    Hi ccrash,

    Here's the code. But I don't know what it means.
    my ($srcip) = $whole_event_string =~ /^(\d+\.\d+\.\d+\.\d+)/;
    I only understand that it is checking whether the entry would have something like IP Address as above. But does it pass the IP address to the $srcip variables ?

    Yes, it does. Your code is using a regular expression to find a pattern that looks like an IP Address. Before telling you how it does it, I recommend you to have a look at the perl documentation on regular expressions. In your particular case, the variable $whole_event_string holds the Log entries. Every Log entry is analysed to see whether it has a pattern that begins (that is the meaning of the ^ symbol) with the following sequence of characters:

    \d+ one or more digits \. a dot \d+ one or more digits \. a dot \d+ one or more digits \. a dot \d+ one or more digits

    if there is a match, that sequence of characters is assigned to the variable $srcip

    The following code illustrates what I just described:

    #!/usr/bin/perl use strict; use warnings; while (defined (my $whole_event_string = <DATA>)) { my ($srcip) = $whole_event_string =~ /^(\d+\.\d+\.\d+\.\d+)/; print "\$srcip = $srcip\n"; } __DATA__ 1.2.3.4 - Unauth [09/Oct/2003: 10:12:06 -0700] "GET / HTTP/1.1" 200 19 +79 2.3.4.5 - Unauth [09/Oct/2004: 11:12:06 -0700] "GET / HTTP/1.1" 200 19 +79 3.4.5.6 - Unauth [09/Oct/2005: 12:12:06 -0700] "GET / HTTP/1.1" 200 19 +79 4.5.6.7 - Unauth [09/Oct/2006: 13:12:06 -0700] "GET / HTTP/1.1" 200 19 +79

    If you try it, the output should be:

    $srcip = 1.2.3.4

    $srcip = 2.3.4.5

    $srcip = 3.4.5.6

    $srcip = 4.5.6.7

    I hope this helps

    lin0
Re: Need advice on PERL
by jonadab (Parson) on Dec 12, 2006 at 13:56 UTC

    Others have explained how the pattern match itself works, but they forgot to explain how the whole thing is parsed. The =~ pattern match operator binds more tightly than the = assignment operator, so the pattern match happens first. The parentheses on the left side cause the results of the pattern match to be taken in list context, returning a list of the things captured in parentheses during the match. (This is different from what happens in scalar context. Context is very important in Perl.) In this case it's a list of one thing, which looks like an IPv4 address. That list is assigned to a list of variables. In this case it's a list of just one variable, $srcip. If you wanted to get the four numbers out of the dotted quad, you could do it like this:

    my (@ipnums) = $whole_event_string =~ /^(\d+)\.(\d+)\.(\d+)\.(\d+)/;

    Then the array @ipnums would have four entries in it, one for each of the four numbers in the dotted quad.

    Similarly, if you want to capture more information than just the IP address, you could add to your regular expression and parse more fields with something along these lines...

    my ($srcip, $user, $timestamp, $request, $result) = $whole_event_str +ing =~ /^(\d+\.\d+\.\d+\.\d+)\s+\S+\s+(\S+)\s+[[](.*?)[]]\s+\"(.*?)\"\s ++(\d+)/;

    HTH.HAND. That regular expression may not be exactly right, because I'm not sure of the exact technical specs of the logfile format you're parsing (Is that an IIS log? yuck!), but it illustrates the principle anyway. Also note that if it _is_ an IIS log, or anything else remotely common, there's probably a module on the CPAN for parsing it, although I don't happen to know of a specific module for that, and a quick search didn't turn up anything obvious.


    Sanity? Oh, yeah, I've got all kinds of sanity. In fact, I've developed whole new kinds of sanity. You can just call me "Mister Sanity". Why, I've got so much sanity it's driving me crazy.
Re: Need advice on PERL
by Anonymous Monk on Dec 12, 2006 at 08:52 UTC
    ???
    my $whole_event_string = '1.2.3.4 - Unauth [09/Oct/2003: 10:12:06 -070 +0] "GET / HTTP/1.1" 200 1979'; my ($srcip) = $whole_event_string =~ /^(\d+\.\d+\.\d+\.\d+)/; die 'I read perlintro ', $srcip; __END__ I read perlintro 1.2.3.4 at - line 6.