Dru has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

Hashes are not my strong point (as you can probably tell from my code below), but I'm determined to use them (I've been avoiding them like the plague up to this point) and I've written this script to pick out the unique ip's in a file and count the number of times it appears, but it's not working. Can someone help me out?

Thanks,
Dru
# Script to go through file and count the number # of times an ip appears; use strict; use warnings; my $file = 'd:\temp\file.csv'; my %hash; open (FILE, $file) or die "Can't open $file: $!\n"; while (<FILE>){ my($src, $dst, $rule1) = (split /,/)[1]; $hash{$src} = $src; next if $src =~ /$hash{$src}/; } my $count; for (keys %hash){ while (<FILE>){ if ($_ =~ /$hash{$_}/){ $count++; } $hash{$_}{times} = $count; } } for my $ip (keys %hash){ print "$ip was seen $hash{$ip} times\n"; }

Replies are listed 'Best First'.
Re: Counting the Number of Times and IP Appears
by atcroft (Abbot) on Feb 19, 2004 at 22:39 UTC

    Would it not be easier to say something to the effect of (*warning: untested code*):

    #!/usr/bin/perl -w use strict; my $file = 'd:/temp/file.csv'; my %hash; open(FILE, $file) or die("Can't open $file: $!\n"); while (<FILE>) { my ($src, $duh) = split /,/, $_, 2; $hash{$src}++; } close(FILE); foreach my $ip (sort({$hash{$a} <=> $hash{$b}} keys %hash)) { print "$ip was seen $hash{$ip} times\n"; }

    The code assumes that you are only taking the first item out of each line, discarding the rest. When you do $hash{$src}++, it is created as a value of zero and incremented if it doesn't exist, or incremented if it does.

    Hope that helps....

Re: Counting the Number of Times and IP Appears
by borisz (Canon) on Feb 19, 2004 at 22:33 UTC
    untested, but this should do the trick.
    #!/usr/bin/perl # Script to go through file and count the number # of times an ip appears; use strict; use warnings; my $file = 'd:\temp\file.csv'; my %hash; open (FILE, $file) or die "Can't open $file: $!\n"; while (<FILE>){ my($src, $dst, $rule1) = split /,/, 3; $hash{$src}++; } for my $ip (keys %hash){ print "$ip was seen $hash{$ip} times\n"; }
    Update: For some unknown reason I reposted the original code instead of mine. So here it is
    Boris
Re: Counting the Number of Times and IP Appears
by borisz (Canon) on Feb 19, 2004 at 22:46 UTC
    Or as oneliner
    perl -lne '$h{$1}++ if /^((?:\d+\.)+\d+)/; END{ print "$_ was seen $h{ +$_} times" for (sort keys %h)}' d:\temp\file.csv
    Boris
Re: Counting the Number of Times and IP Appears
by QM (Parson) on Feb 19, 2004 at 22:58 UTC
    Rule #1: Please state how it's not working.

    Rule #2: Please show us sample input data.

    Lot's of little problems, but let's focus on the important stuff first.

    my($src, $dst, $rule1) = (split /,/)[1];
    (split /,/)[1] returns the element with index 1 (2nd element) in the list that split returns. If you really want $src to be the 1st element (index 0), use (split /,/)[0]. (See split for other options.)

    Since you don't need the other vars, that reduces to

    my $src = (split /,/)[0];
    ---
    $hash{$src} = $src;
    The hash value can contain anything, it doesn't need to be the hash key. I would rewrite this as
    $hash{$src}++; # count occurrences
    ---
    next if $src =~ /$hash{$src}/;
    This is useless, as it's the last statement in the loop, and $hash{$src} has just been set to $src on the line before.

    There's no reason to step through the file twice, as you already have all of the info stored. Instead of all the rest, do this

    foreach (keys %hash) { print "$_ was seen $hash{$_} times\n"; }
    Cheers,

    -QM
    --
    Quantum Mechanics: The dreams stuff is made of

      Thank you everyone for your help, here's what my final code looks like that does what I need.
      # Script to go through file and count the number of # times an ip appears; use strict; use warnings; my $file = 'd:\temp\text.csv'; my %hash; open (FILE, $file) or die "Can't open $file: $!\n"; while (<FILE>){ my($src) = (split /,/)[1]; $hash{$src}++; } foreach my $ip (sort({$hash{$b} <=> $hash{$a}} keys %hash)) { print "$ip appears $hash{$ip} times\n"; }