Removing duplicate hashes based on only one key

Amoe has asked for the wisdom of the Perl Monks concerning the following question:

Hey there, perl monks blessed with wisdom from the HTML::Tree of knowledge. I'm having some trubbs with something. Some code, to be slightly more specific. I need to remove duplicates from an array. Easy you might say, and it's in the FAQs. However, after struggling with that solution for what seems like an age, I finally see the problem (and this is gonna sound really stupid). I'm not actually sorting a real array of elements, I'm sorting an array of hashes. Each containing two keys. Let us call them 'bob' and 'alice'. The array would look like this:

@array = [{bob => 'value1', alice => 'value2'}, {bob => 'othervalue1',
+ alice => 'othervalue2'}];    # and so on for loads of elements...
[download]

Obviously this wouldn't be very useful, but the hashes contain *swishes hands* top secret data. (Not really top secret; I just prefer to remain mysterious). So when I'm removing duplicates from this array, here's the crux of the matter. I need to remove the whole hash if only the 'bob' field is a duplicate, regardless of whether the 'alice' field is also a duplicate. So:

$hash = {bob => 'I am a value', alice => 'You're not a value.'};
$hash2 = {bob => 'I am a value', alice => 'You bloody well aren't a va
+lue!'}
[download]

The second hash would be considered a duplicate of the first, the two bob fields being equal, and removed. Anyone up for suggesting how to do this? I'm pretty stumped.

--
my @words = grep({ref} @clever);
sub version { print "I cuss your capitalist pig tendencies bad!\n" }

Comment on Removing duplicate hashes based on only one key Select or Download Code

Replies are listed 'Best First'.
Re: Removing duplicate hashes based on only one key by merlyn (Sage) on Sep 20, 2001 at 20:16 UTC
`my @new_array = do { my %bobs_your_uncle; grep !$bobs_your_uncle{$_->{bob}}++, @array; };` [download] -- Randal L. Schwartz, Perl hacker	[reply] [d/l]
Re: Re: Removing duplicate hashes based on only one key by dragonchild (Archbishop) on Sep 20, 2001 at 20:31 UTC
I know why it works, but would you mind explaining (in your inimitable way) to the young'uns what's going on? :-) ------ We are the carpenters and bricklayers of the Information Age. Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.	[reply]
Re: Re: Re: Removing duplicate hashes based on only one key by chromatic (Archbishop) on Sep 20, 2001 at 21:21 UTC
grep walks through every element of the array, setting $_. He's also created a hash. As $_ points to the appropriate hash in the array, he grabs the value of the 'bob' field, and uses it as a key in the temporary hash. The negation and postincrement magic just make the expression within grep return a true value if this is the first occurrence of the key, and false if the key has reoccurred. Since grep returns a list of only those values for which its expression is true, it weeds out all of the duplicate elements. Once you understand how some of the more arcane operations (grep, map, sort) work on lists and how to manipulate list items within their expressions, you'll grok these tricks really easily.	[reply]
Re: Re: Re: Removing duplicate hashes based on only one key by arturo (Vicar) on Sep 20, 2001 at 21:24 UTC
re-quoting code : `my @new_array = do { my %bobs_your_uncle; grep !$bobs_your_uncle{$_->{bob}}++, @array; };` [download] Basically, the `do{}` block executes the code inside the braces; you could call a sub and get the same effect here. The last statement in the block is the `grep` which returns a list, which is what ends up in `@new_array`. The first line declares a hash. Nothing special about that. The magic's in the second line. `grep CONDITION, LIST` returns a list of the members of `LIST` that match the `CONDITION`. Here, the magic's almost all in the construction of the condition. `$_->{bob}` is the value associated with the "bob" key of the current element of the array (which is reference to a hash). e.g. if the current element of the array is (to put it visually), `{ bob=>'carol', alice=>"ted" }` [download] `$_->{bob}` is "carol". OK, so we ask the question: is `$bobs_your_uncle{carol}` true? Well, if it's the first time we've seen it, no. That value's undefined. So that test -- with the negation in front of it -- turns out TRUE the first time "carol" is seen as the value of "bob" in the array. The value is undefined, which is false; not-that is true. The ++ on the end of the condition says "OK, increment `$bobs_your_uncle{carol}` by 1 after you've performed the test", which means that, AFTER the element's been pushed or not pushed onto the result list, the value gets incremented. Bottom line: if this is the first time the `grep` "loop" has seen "carol", the `undef` gets converted to 0, and the value of `$bobs_your_uncle{carol}` is set to 1. Thus, the next time "carol" is seen, the condition evaluates to "false" (not-true: positive integer values are true in Perl), and the value is not pushed onto the result list. All of which goes to show how frickin' cool this language is. `perl -e 'print "How sweet does a rose smell? "; chomp ($n = <STDIN>); +$rose = "smells sweet to degree $n"; other_name = rose; print "$oth +er_name\n"'` [download]	[reply] [d/l] [select]
Re: Removing duplicate hashes based on only one key by roaima (Initiate) on Sep 20, 2001 at 20:32 UTC
> I need to remove the whole hash if only the 'bob' field is a duplicate, regardless of whether the 'alice' field is also a duplicate What criteria do you use to determine which alice should be kept and which should be discarded as a duplicate? Chris	[reply]