ewhitt has asked for the wisdom of the Perl Monks concerning the following question:

I have been using this code to sort unique IP addresses out of an array.
@uniqueIPs = sort keys %{ { map { $_, 1 } @ipAddresses } };
How could I modify it to display how many instances of each IP address are found as well?

Thanks!

Replies are listed 'Best First'.
Re: Counting unique instances from Array Sort
by Skeeve (Parson) on Jan 07, 2008 at 09:22 UTC
    The answer is given more than once here. But again:
    my %ip_count; ++$ip_count{$_} for (@ipAddresses);

    s$$([},&%#}/&/]+}%&{})*;#$&&s&&$^X.($'^"%]=\&(|?*{%
    +.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e
      I am not sure if I follow. What I am trying to do is take:
      my @ipAddresses = ("172.16.16.1 ", "172.16.16.1 ", "172.16.16.1 ", "17 +2.16.16.2 ");
      and be able to print something like this while / after the sort:
      172.16.16.1,3 172.16.16.2,1
      Thanks!
Re: Counting unique instances from Array Sort
by ikegami (Patriarch) on Jan 07, 2008 at 09:45 UTC

    The idiomatic way of eliminating duplicates creates an hash of counts as a byproduct.

    my %counts; my @uniqueIPs = grep !$counts{$_}++, @ipAddresses; for my $ip (sort @uniqueIPs) { print("$ip ($counts{$ip})\n"); }

      Okay, I've seen and used this a hundred times, but I have to come clean...I have no idea what is going on "under the hood" with this:

      my %counts; my @uniqueIPs = grep !$counts{$_}++, @ipAddresses;

      I'd like to have someone explain it to me in a sentence or two. But let me take a stab at it first:
      1. The first element in the list @ipAddresses becomes the first key in the hash %counts.
      2. And because the first element of @ipAddresses ($_) is not equal to the first element of @uniqueIPs (mainly because it's empty), the first element of @ipAddresses ($_) is "pushed" onto @uniqueIPs.
      3. HERE'S WHERE I GET LOST: What is incrementing that value? The fact that it satisfies the condition of not matching, therefore being true? And let's say the next element in @ipAddresses is the same as the first, and because it IS equal to the first element of @uniqueIPs it is *not* pushed. QUESTION: Why would the value of that first key get incremented.

      So, is the condition asking "if it is not equal then increment it?"

      What am I not getting? Thanks.

      —Brad
      "The important work of moving the world forward does not wait to be done by perfect men." George Eliot
        my @b = grep EXPR, @a;

        is equivalent to

        my @b; foreach (@a) { if (EXPR) { push @b, $_; } }

        In this case:

        my @uniqueIPs; foreach (@uniqueIPs) { if (!$counts{$_}++) { push @ipAddresses, $_; } }

        As for post-incrementing,

        $x++

        is equivalent to

        my $orig_x = $x; ++$x; $orig_x

        In this case:

        my @uniqueIPs; foreach (@uniqueIPs) { my $old_count = $counts{$_}; $counts{$_}++; # Add one to count. if (!$old_count) { # If it's the first time we've seen it, push @ipAddresses, $_; # save it } }

        So,

        my %counts; my @uniqueIPs = grep !$counts{$_}++, @ipAddresses;

        is short for

        my @uniqueIPs; foreach (@uniqueIPs) { $counts{$_}++; # Add one to count. if ($counts{$_} == 1) { # If it's the first time we've seen it, push @ipAddresses, $_; # save it } }

        It is dense, but it's tried and true. Just add a comment for the less learned readers.

        # Remove duplicates IP addresses by counting # the number of times each address occurs. my %counts; my @uniqueIPs = grep !$counts{$_}++, @ipAddresses;

        Oh by the way, you could also write it as follows if it's less confusing:

        # Remove duplicates IP addresses by counting # the number of times each address occurs. my %counts; my @uniqueIPs = grep ++$counts{$_} == 1, @ipAddresses;
        This question deserves a longer and clearer answer than I have time to give, but the first step to understanding this idiomatic piece of code is realizing that the answer to the question
        Why would the value of that first key get incremented?
        is that incrementation (and post-incrementation at that, which is the other key to figuring out the idiom) happens because it is always explicitly applied to the value of the current key of the  %counts hash; there is nothing whatsoever conditional about the incrementation.
        Ok - here is a functional decomposition of "my @uniqueIPs = grep !$counts{$_}++, @ipAddresses;":
        my @uniqueIPs; #Just declare it as an empty array; # The resulting value in %count is equivalent to the execution of t +his statement: $counts{$_}++ for @ipAddresses; # This is the important piece. # The hash %counts uses each ipAddress as a key. # The first time an IP address is encountered, the key-value pair is c +reated, with a value of zero. # the(++) increments that to 1. # The next time that IP is encountered, the value is incremented. So, + at the end, the value for each key # contains the count of occurrances for that IP address (key). ... grep !$counts{$_}++, @ipAddresses # The "grep:searches through each value ($counts{$_} is the VALUE), +and the negation (!) # looks for non-zero values. Because the operator used is post-increm +ent($blah++), before negation, # the value returnedwill be zero for First-seen IP addresses only. +For second and subsequent # sightings of the IP, the pre-negation value will be non-zero, an +d post-negation will be zero. # "grep" filters out, leaving only non-zero values, effectively retu +rning all the KEYS of %counts, which is # all the unique IP's.

             "As you get older three things happen. The first is your memory goes, and I can't remember the other two... " - Sir Norman Wisdom

Re: Counting unique instances from Array Sort
by ambrus (Abbot) on Jan 07, 2008 at 09:26 UTC
Re: Counting unique instances from Array Sort
by locked_user sundialsvc4 (Abbot) on Jan 07, 2008 at 19:02 UTC

    The essential idea here is that we are using a hash to keep track of the counts. A hash, as you well know, is a very efficient way to look-up a moderate number of values based on a “key” such as IP-address.

    Now here's where the coder decided to take advantage of one of Perl's many “shortcuts.” He “knew” that if you increment ("++") a hash-key that doesn't exist yet, Perl will “helpfully” treat that key as though it did exist had the value zero.

    As for me, I don't like to see code like that. The case where a particular key does not yet exist in a hash-structure is logically distinct from the case where it does. Therefore, I prefer to see that distinction expressly taken care of within the code, even if the resulting code is “inefficient.”

    So I prefer to have something very pedantic, like:   (complete with comments!) ...

    # Maintain a running count of all the unique IP-addresses seen ... foreach my $key (@address_list) { if (defined($ip_occurs[$key])) { $ip_occurs[$key]++; # seen again ... } else { $ip_occurs[$key] = 1; # first time ... } }

    (The above code has been edited to fix an obvious dmub tpyo ...)

    Notice that it does not matter in what order the keys are scanned when putting them into the hash-table. There is no reason to sort the keys in this loop. Instead, you will sort the keys when you extract them from the hash-table:

    foreach my $key (sort keys %ip_occurs) { print "$key occurs $ip_occurs[$key] times\n"; }
    (Caution: extemporaneous Perl! Not responsible for tpy0s...)