cipher has asked for the wisdom of the Perl Monks concerning the following question:

Dear Perlmonks, I have an array which contains lots of IP addresses.
@ip = (1.1.1.1 2.2.2.2 1.1.1.1 4.4.4.4 1.1.1.1 4.4.4.4)
I want to find the top unique ip's with their count from this array. Example: Output should be top n, In this case n = 2.
IP -- Count 1.1.1.1 3 4.4.4.4 2
I have read many forums and posts and I see most perl experts recommend using hash for this purpose. I already have a running code but here I have my IP address in a scalar, My new code is different and has IP address in an array. Here is my running code:
use strict; use warnings; my $fields = (); my @logs = (); my $reg_sip = (); my $si = (); my $sIp = (); my $key = (); my %sIp = (); open FILE, "c:/perl/fw.log" or die $!; while ($fields = <FILE>) { @logs = split (/ /,$fields); foreach $reg_sip (@logs) {if ($reg_sip =~ m/src=\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1 +,3}/) {$si = substr $reg_sip, 4; $sIp{$si} += 1; }} } my $top = 10; print "\nTop Ten Sources are\n"; my $i =0; foreach $key (sort {$sIp{$b} <=> $sIp{$a}} keys %sIp) {print "\n$key\t - $sIp{$key}"; $i++; if ($i == $top){last;}} ## Result Top Ten Sources are 10.71.74.100 - 2706 192.168.237.103 - 1896 192.168.237.119 - 1473 10.71.74.73 - 1302 192.168.237.110 - 650 10.71.74.139 - 325 10.71.74.74 - 238 192.168.33.32 - 213 192.168.237.247 - 124 192.168.237.250 - 18
I want to acheive similar output but from array. Example:
use strict; use warnings; my @srcs = (); open FILE, "c:/perl/fw.log" or die $!; while (<FILE>) { if (my ($src) =/\bsrc=(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})/){push +@srcs, $src;} } #### Code to print Top n unique ip's in descending order from @srcs.

Replies are listed 'Best First'.
Re: Top n Unique count from Array
by davido (Cardinal) on Mar 25, 2011 at 08:20 UTC

    How about this?

    use Modern::Perl; my %seen; while( <DATA> ) { chomp; $seen{$_}++; } my @top = ( sort { $seen{$b} <=> $seen{$a} } keys %seen )[0,1]; foreach ( @top ) { say "$_ $seen{$_}"; } __DATA__ 1.1.1.1 2.2.2.2 1.1.1.1 4.4.4.4 1.1.1.1 4.4.4.4

    Dave

      UPDATE: Looks like I found the solution. This one is working for me:
      my $top = 10; my %cnt_hash; $cnt_hash{$_}++ foreach @srcs; foreach $key (sort {$cnt_hash{$b} <=> $cnt_hash{$a}} keys %cnt_hash) { print "$key\t$cnt_hash{$key}\n"; $i++; if ($i == $top){last;} }
Re: Top n Unique count from Array
by JavaFan (Canon) on Mar 25, 2011 at 10:13 UTC
    I want to acheive similar output but from array.
    But the first thing your initial program does when reading in a line is to turn it into an array!
    if (my ($src) =/\bsrc=(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})/){push @srcs, $src;}
    So, your question is, I replaced @logs = split (/ /,$fields); with a similar line that uses @srcs instead of @logs, how do I continue?. Maybe the easiest is to add @logs = @srcs;?
Re: Top n Unique count from Array
by Anonymous Monk on Mar 25, 2011 at 07:36 UTC
    I want to acheive similar output but from array.

    Why?

    Where is your attempt?

      Why?
      I want to use the IP address multiple time for various purpose in my script so I am pushing them in an array.

      Where is your attempt?
      Tried this, following an example but output is not as I want it.
      Output is not sorted and where do I specify that I only want top n unique values.
      %seen = (); foreach $item (@srcs) { push(@uniq, $item) unless $see +n{$item}++; } print "@uniq\n";

      This one only gives the unique ip addresses and not count.
      my %count; foreach $1 ( @srcs ) { $count{$1}++; } my @srcs = sort keys %count;
        I want to acheive similar output but from array.

        Why?

        I want to use the IP address multiple time for various purpose in my script so I am pushing them in an array.

        The question was why do you want an array?

        Tried this, following an example but output is not as I want it. Output is not sorted and where do I specify that I only want top n unique values.

        But why did you try that?

        Why should that code produce something sorted?

        What series of steps does the code perform to produce a sorted array of IPs and hits?

        Also, what do you mean where?

        Please think about it, and try to answer these questions out loud ; speak the answer to your monitor or an object on your desk.

        What I would do is redirect the output of the first program to a file, then execute head --lines=N file to get the top N results