in reply to Re: Counting unique instances from Array Sort
in thread Counting unique instances from Array Sort

Okay, I've seen and used this a hundred times, but I have to come clean...I have no idea what is going on "under the hood" with this:

my %counts; my @uniqueIPs = grep !$counts{$_}++, @ipAddresses;

I'd like to have someone explain it to me in a sentence or two. But let me take a stab at it first:
1. The first element in the list @ipAddresses becomes the first key in the hash %counts.
2. And because the first element of @ipAddresses ($_) is not equal to the first element of @uniqueIPs (mainly because it's empty), the first element of @ipAddresses ($_) is "pushed" onto @uniqueIPs.
3. HERE'S WHERE I GET LOST: What is incrementing that value? The fact that it satisfies the condition of not matching, therefore being true? And let's say the next element in @ipAddresses is the same as the first, and because it IS equal to the first element of @uniqueIPs it is *not* pushed. QUESTION: Why would the value of that first key get incremented.

So, is the condition asking "if it is not equal then increment it?"

What am I not getting? Thanks.

—Brad
"The important work of moving the world forward does not wait to be done by perfect men." George Eliot

Replies are listed 'Best First'.
Re^3: Counting unique instances from Array Sort: explanation needed
by ikegami (Patriarch) on Jan 07, 2008 at 17:35 UTC
    my @b = grep EXPR, @a;

    is equivalent to

    my @b; foreach (@a) { if (EXPR) { push @b, $_; } }

    In this case:

    my @uniqueIPs; foreach (@uniqueIPs) { if (!$counts{$_}++) { push @ipAddresses, $_; } }

    As for post-incrementing,

    $x++

    is equivalent to

    my $orig_x = $x; ++$x; $orig_x

    In this case:

    my @uniqueIPs; foreach (@uniqueIPs) { my $old_count = $counts{$_}; $counts{$_}++; # Add one to count. if (!$old_count) { # If it's the first time we've seen it, push @ipAddresses, $_; # save it } }

    So,

    my %counts; my @uniqueIPs = grep !$counts{$_}++, @ipAddresses;

    is short for

    my @uniqueIPs; foreach (@uniqueIPs) { $counts{$_}++; # Add one to count. if ($counts{$_} == 1) { # If it's the first time we've seen it, push @ipAddresses, $_; # save it } }

    It is dense, but it's tried and true. Just add a comment for the less learned readers.

    # Remove duplicates IP addresses by counting # the number of times each address occurs. my %counts; my @uniqueIPs = grep !$counts{$_}++, @ipAddresses;

    Oh by the way, you could also write it as follows if it's less confusing:

    # Remove duplicates IP addresses by counting # the number of times each address occurs. my %counts; my @uniqueIPs = grep ++$counts{$_} == 1, @ipAddresses;

      Brilliant. I think if I had all of Perl explained to me this way I'd be a saint in no time.

      Besides the overall deconstruction, rehearsing the pre/post incrementing has been helpful.

      Thanks NetWallah and ikegami!

      —Brad
      "The important work of moving the world forward does not wait to be done by perfect men." George Eliot

        Your welcome. In my case, it's "By explaining Perl this way, I became a saint in no time." :)

Re^3: Counting unique instances from Array Sort: explanation needed
by Anonymous Monk on Jan 07, 2008 at 14:20 UTC
    This question deserves a longer and clearer answer than I have time to give, but the first step to understanding this idiomatic piece of code is realizing that the answer to the question
    Why would the value of that first key get incremented?
    is that incrementation (and post-incrementation at that, which is the other key to figuring out the idiom) happens because it is always explicitly applied to the value of the current key of the  %counts hash; there is nothing whatsoever conditional about the incrementation.
Re^3: Counting unique instances from Array Sort: explanation needed
by NetWallah (Canon) on Jan 07, 2008 at 17:00 UTC
    Ok - here is a functional decomposition of "my @uniqueIPs = grep !$counts{$_}++, @ipAddresses;":
    my @uniqueIPs; #Just declare it as an empty array; # The resulting value in %count is equivalent to the execution of t +his statement: $counts{$_}++ for @ipAddresses; # This is the important piece. # The hash %counts uses each ipAddress as a key. # The first time an IP address is encountered, the key-value pair is c +reated, with a value of zero. # the(++) increments that to 1. # The next time that IP is encountered, the value is incremented. So, + at the end, the value for each key # contains the count of occurrances for that IP address (key). ... grep !$counts{$_}++, @ipAddresses # The "grep:searches through each value ($counts{$_} is the VALUE), +and the negation (!) # looks for non-zero values. Because the operator used is post-increm +ent($blah++), before negation, # the value returnedwill be zero for First-seen IP addresses only. +For second and subsequent # sightings of the IP, the pre-negation value will be non-zero, an +d post-negation will be zero. # "grep" filters out, leaving only non-zero values, effectively retu +rning all the KEYS of %counts, which is # all the unique IP's.

         "As you get older three things happen. The first is your memory goes, and I can't remember the other two... " - Sir Norman Wisdom