Amoe has asked for the wisdom of the Perl Monks concerning the following question:

Hey there, perl monks blessed with wisdom from the HTML::Tree of knowledge. I'm having some trubbs with something. Some code, to be slightly more specific. I need to remove duplicates from an array. Easy you might say, and it's in the FAQs. However, after struggling with that solution for what seems like an age, I finally see the problem (and this is gonna sound really stupid). I'm not actually sorting a real array of elements, I'm sorting an array of hashes. Each containing two keys. Let us call them 'bob' and 'alice'. The array would look like this:
@array = [{bob => 'value1', alice => 'value2'}, {bob => 'othervalue1', + alice => 'othervalue2'}]; # and so on for loads of elements...
Obviously this wouldn't be very useful, but the hashes contain *swishes hands* top secret data. (Not really top secret; I just prefer to remain mysterious). So when I'm removing duplicates from this array, here's the crux of the matter. I need to remove the whole hash if only the 'bob' field is a duplicate, regardless of whether the 'alice' field is also a duplicate. So:
$hash = {bob => 'I am a value', alice => 'You're not a value.'}; $hash2 = {bob => 'I am a value', alice => 'You bloody well aren't a va +lue!'}
The second hash would be considered a duplicate of the first, the two bob fields being equal, and removed. Anyone up for suggesting how to do this? I'm pretty stumped.

--
my @words = grep({ref} @clever);
sub version { print "I cuss your capitalist pig tendencies bad!\n" }

Replies are listed 'Best First'.
Re: Removing duplicate hashes based on only one key
by merlyn (Sage) on Sep 20, 2001 at 20:16 UTC
      I know why it works, but would you mind explaining (in your inimitable way) to the young'uns what's going on? :-)

      ------
      We are the carpenters and bricklayers of the Information Age.

      Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.

        grep walks through every element of the array, setting $_. He's also created a hash. As $_ points to the appropriate hash in the array, he grabs the value of the 'bob' field, and uses it as a key in the temporary hash. The negation and postincrement magic just make the expression within grep return a true value if this is the first occurrence of the key, and false if the key has reoccurred.

        Since grep returns a list of only those values for which its expression is true, it weeds out all of the duplicate elements.

        Once you understand how some of the more arcane operations (grep, map, sort) work on lists and how to manipulate list items within their expressions, you'll grok these tricks really easily.

        re-quoting code :

        my @new_array = do { my %bobs_your_uncle; grep !$bobs_your_uncle{$_->{bob}}++, @array; };

        Basically, the do{} block executes the code inside the braces; you could call a sub and get the same effect here. The last statement in the block is the grep which returns a list, which is what ends up in @new_array.

        The first line declares a hash. Nothing special about that. The magic's in the second line.

        grep CONDITION, LIST returns a list of the members of LIST that match the CONDITION. Here, the magic's almost all in the construction of the condition.

        $_->{bob} is the value associated with the "bob" key of the current element of the array (which is reference to a hash). e.g. if the current element of the array is (to put it visually),

        { bob=>'carol', alice=>"ted" }

        $_->{bob} is "carol". OK, so we ask the question: is $bobs_your_uncle{carol} true? Well, if it's the first time we've seen it, no. That value's undefined. So that test -- with the negation in front of it -- turns out *TRUE* the first time "carol" is seen as the value of "bob" in the array. The value is undefined, which is false; not-that is true.

        The ++ on the end of the condition says "OK, increment $bobs_your_uncle{carol} by 1 after you've performed the test", which means that, AFTER the element's been pushed or not pushed onto the result list, the value gets incremented.

        Bottom line: if this is the first time the grep "loop" has seen "carol", the undef gets converted to 0, and the value of $bobs_your_uncle{carol} is set to 1.

        Thus, the next time "carol" is seen, the condition evaluates to "false" (not-true: positive integer values are true in Perl), and the value is not pushed onto the result list.

        All of which goes to show how frickin' cool this language is.

        perl -e 'print "How sweet does a rose smell? "; chomp ($n = <STDIN>); +$rose = "smells sweet to degree $n"; *other_name = *rose; print "$oth +er_name\n"'
Re: Removing duplicate hashes based on only one key
by roaima (Initiate) on Sep 20, 2001 at 20:32 UTC

    > I need to remove the whole hash if only the 'bob' field is a duplicate, regardless of whether the 'alice' field is also a duplicate

    What criteria do you use to determine which alice should be kept and which should be discarded as a duplicate?

    Chris