indkebr has asked for the wisdom of the Perl Monks concerning the following question:

Hi Experts, I'm sure this will be an easy one for you all and would truely appreciate some help. I am an extreme newbie with Perl, and have been writing a script with the help of a friend. Unfortunately my friend was just shipped overseas to Iraq, and he helped me write the majority of this script and now i'm lost. I have the start of my script working fine, but am stuck at this next peice of the script. My goal is go grep all IP's from a log file, and then count the # of each IP it finds. If 1.2.3.4 is found in the log file twice, I would like it to display 1.2.3.4 and add the # it finds of that IP to my print statement/csv file. I could do this if this was the only function the script had, however the script is doing a lot more than that so I need to incorporate this into the existing script. Below is the script so far. Now I need help with adding the next section to the script, the IP count.
#!/usr/bin/perl -w use strict; my %domains; my @domain_array = qw(ebay.com paypal.com americanbank.com usbank.com +americangreetings.com); my $log_domain; my $host_test; my $host_cmp; my $host; my @bad_domains; my $spf; my $dkim; for (@domain_array) { $domains{$_}{"counter"} = 0; $domains{$_}{"dkim0"} = 0; $domains{$_}{"dkim1"} = 0; $domains{$_}{"dkim2"} = 0; $domains{$_}{"dkim3"} = 0; $domains{$_}{"dkim4"} = 0; $domains{$_}{"spf0"} = 0; $domains{$_}{"spf1"} = 0; $domains{$_}{"spf2"} = 0; $domains{$_}{"spf3"} = 0; $domains{$_}{"spf4"} = 0; } open ($log_domain, "logdata") || die "$!"; while (<$log_domain>) { ($host) = $_ =~ /domain=([\w\.]+?)\s/;#find regex for domain ($spf)= $_ =~ /spf=([0-4])\s/; #find regex for spf1 ($dkim) = $_ = /dkim=([0-4])\s/; # find regex for dkim1 ###IP Regex - I'm assuming the regex for this is --($IP) = $_ =~ /ip=( +([0-1]?[0-9]{1,2}\.)|(2[0-4][0-9]\.)|(25[0-5]\.)){3}(([0-1]?[0-9]{1,2 +})|(2[0-4][0-9])|(25[0-5]))\s/; $host_test = 0; foreach $host_cmp (keys %domains) #pull each key from domains hash { if ($host =~ $host_cmp) # if host equals domain in hash, incriment + counter { $domains{$host}{"counter"}++; $domains{$host}{"spf$spf"}++; $domains{$host}{"dkim$dkim"}++; $host_test = 1; #Test to ensure domain is present } } if ($host_test == 0) #if domain is not in array { push (@bad_domains , $host); } } print "Domain,\"Domain Count\",Dkim0,Dkim1,Dkim2,Dkim3,Dkim4,Spf0,Spf1 +,Spf2,Spf3,Spf4\n"; foreach $host_cmp (keys %domains) { print "$host_cmp,"; print $domains{$host_cmp}{"counter"}.","; print $domains{$host_cmp}{"dkim0"}.","; print $domains{$host_cmp}{"dkim1"}.","; print $domains{$host_cmp}{"dkim2"}.","; print $domains{$host_cmp}{"dkim3"}.","; print $domains{$host_cmp}{"dkim4"}.","; print $domains{$host_cmp}{"spf0"}.","; print $domains{$host_cmp}{"spf1"}.","; print $domains{$host_cmp}{"spf2"}.","; print $domains{$host_cmp}{"spf3"}.","; print $domains{$host_cmp}{"spf4"}."\n"; } print "The total amount of domains that we don't care about is ".($#ba +d_domains+1)."\n"; close ($log_domain);

Replies are listed 'Best First'.
Re: IP Parse and Count from logfile
by mscharrer (Hermit) on Sep 17, 2008 at 16:09 UTC
    Hi indkebr,
    some tips for your future perl code:
    • Use more loops, especially for initialisations
    • Avoid hardcoding of names ala spf2 if not needed, use e.g. a constant array (see below)
    • To compare two strings use eq not =~
    • The easiest thing to test if a name exists in a list is to store it as hash (as you did) and use the exists operator (see below)
    • Use local scope variables, especially for loops, e.g. foreach my $var (@vars)
    • Use the readmore tags on perlmonks for medium or large code which isn't needed to present your basic question.
    • For some things you can use hash (or array) slices: my %hash; @hash{"key1","key2",key5"} = (0,0,0);

    I changed your code a little to improve it and added the IP counter and print commands. I'm really not sure if I understood it right what exactly you need, so it might not be correct.

    Please note that I didn't checked your IP regex, but it looks ok on the first quick look. You didn't provided a test input file so I couldn't test if my changes did introduce functional bugs.


Re: IP Parse and Count from logfile
by thundergnat (Deacon) on Sep 17, 2008 at 18:16 UTC

    That IP regex is not going to work for you. You've got too many capturing groups. (It isn't invalid, it just won't return what you think it does.) As it stands it will return the third digit group and a full stop.

    I.E.
    my $address = 'ip=111.112.113.114 '; my ($ip) = $address =~ /ip=(([0-1]?[0-9]{1,2}\.)|(2[0-4][0-9]\.)|(25[0 +-5]\.)){3}(([0-1]?[0-9]{1,2})|(2[0-4][0-9])|(25[0-5]))\s/; print $ip;
    will print
    113.
    You would be better off using something like this:
    my $tuple = qr'[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5]'; my $address = 'ip=111.112.113.114 '; my ($ip) = $address =~ /ip=(($tuple\.){3}$tuple)\s/; print $ip;
    Oh, and I pretty much agree with everything mscharrer said. See my take on it.
Re: IP Parse and Count from logfile
by dsheroh (Monsignor) on Sep 17, 2008 at 16:03 UTC
    The counting is easy enough: Just add a new hash, my %ip_addresses; and then, after you get the ip address into $IP, do a $ip_addresses{$IP}++; to increment the count of how many times the address has been seen.

    To print out the list of seen addresses with their counts, use

    for my $addr (sort keys %ip_addresses) { print $addr, ': ', $ip_addresses{$addr}, "\n"; }
    BTW, you appear to be missing the declaration for $IP (my $IP;), which will be needed for the code to run under strict once the IP regex is uncommented.

    (Code fragments not tested, but simple enough that they should work... (Famous last words, I know.))