removing duplicates from an array of hashes

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: removing duplicates from an array of hashes by kcott (Archbishop) on Apr 17, 2014 at 04:26 UTC
Here's a grep solution: `my %seen; @$ref = grep { ! $seen{$_->{id}}++ } @$ref;` [download] My test is in the spoiler. Script: `#!/usr/bin/env perl use strict; use warnings; use Data::Dump; my $ref = [ map { +{ id => $_ } } qw{a b c b} ]; dd $ref; my %seen; @$ref = grep { ! $seen{$_->{id}}++ } @$ref; dd $ref;` [download] Output: `[{ id => "a" }, { id => "b" }, { id => "c" }, { id => "b" }] [{ id => "a" }, { id => "b" }, { id => "c" }]` [download] -- Ken	[reply] [d/l] [select]
Re: removing duplicates from an array of hashes by bigj (Monk) on Apr 17, 2014 at 04:53 UTC
As TMTWTDI, I'll add an inplace version: `my %seen; for my $i (reverse (0 .. @$ref-1)) { # replaces current item in array with last item # and removes the last element if id already seen $ref->[$i] = pop @$ref if $seen{$ref->[$i]->{id}}++; }` [download] Note: Ordering in the array might change Update: inplace version that keeps the ordering `my %seen; my $removed = 0; for my $i (0 .. @$ref-1) { my $item = $ref->[$i]; $seen{$item->{id}}++ ? $removed++ : ($ref->[$i-$removed] = $item); } splice @$ref,-$removed;` [download] Greetings, Janek Schleicher	[reply] [d/l] [select]
Re^2: removing duplicates from an array of hashes by AnomalousMonk (Archbishop) on Apr 17, 2014 at 17:32 UTC
Just a matter of idle curiosity... I notice you use `@$ref-1` instead of `$#$ref` for obtaining the max array index. Why do you prefer this? If `$[` were ever changed (but don't do that!), `@$ref-1` would yield an incorrect value for max index. (Although I suppose that one could argue that to be `$[`-safe, one ought always to iterate over `$[ .. $#array` rather than `0 .. $#array` to safely visit the entire range of `@array` indices.) Also, in your second code example `splice @$ref,-$removed;` might be written as the (presumably) marginally faster `$#$ref -= $removed;` Update: I should make it clear that this post was prompted mainly by the non-idiomatic nature of the `@$ref-1` expression. Any "lack-of-safety" issue is only a kind of cayenne pepper icing on the cake.	[reply] [d/l] [select]
Re^3: removing duplicates from an array of hashes by bigj (Monk) on Apr 18, 2014 at 14:23 UTC
Reasoning is simple: I didn't program for a long time (close to 10yrs), I still were sure that `$#array` is the last item, but I was too lazy to look up, how to note it when it is a `$arrayref`, so I just took `0..@$arrayref-1` what works same here and for at least where I stopped, it was same idiomatic. Every programmer in the world understand `0..length(array)-1` in a heartbeat, only Perl cracks will understand `$[..$#array`. I wasn't aware that I could just `$#array -= $n` to make the array smaller. To be honest, I'm not sure whether I'd like it. It works only in so rare cases where we basical want to `pop @array, $n` and don't care about what were the last items. `splice` is a clear idiom that we intend to remove entries out of an array and then specify which. If the special case were more often, o.k., but how often do we even use `splice`? IMHO that is even rare, most of the time we `pop, shift or slice with @array[4..7,11..13]` and so on, so no need to trick us selfs just to trick us. Code should be easy. Anyway, pretty to cool to have learned some new tricks in Perl :-) Greetings, Janek Schleicher	[reply] [d/l] [select]
Re: removing duplicates from an array of hashes by atcroft (Abbot) on Apr 17, 2014 at 04:02 UTC
This was the first thing that popped into my head: Code: `perl -MData::Dumper -le ' my $ref = []; $ref->[0] = { id => "a" }; $ref->[1] = { id => "b" }; $ref->[2] = { id => "c" }; $ref->[3] = { id => "b" }; print Data::Dumper->Dump( [ \$ref, ], [ qw( ref ) ] ); my $temp; my %seen; while ( my $t = shift @{$ref} ) { if (not defined $seen{$t->{id}}) { push @{$temp}, $t; $seen{$t->{id}}++; } } print Data::Dumper->Dump( [ \$temp, ], [ qw( temp ) ] ); '` [download] Output: Read more... (770 Bytes) Hope that helps.	[reply] [d/l] [select]
Re: removing duplicates from an array of hashes by NetWallah (Canon) on Apr 17, 2014 at 04:04 UTC
One liner (Formatted): `perl -MData::Dumper -E 'my $r=[map { id=>$_ }, ("a".."c","b")]; say Dumper $r; my %h; my @z= map {$h{$_->{id}}++ ?():$_ } @$r; say Dumper \@z'` [download] Output: `$VAR1 = [ { 'id' => 'a' }, { 'id' => 'b' }, { 'id' => 'c' }, { 'id' => 'b' } ]; $VAR1 = [ { 'id' => 'a' }, { 'id' => 'b' }, { 'id' => 'c' } ];` [download] What is the sound of Perl? Is it not the sound of a wall that people have stopped banging their heads against? -Larry Wall, 1992	[reply] [d/l] [select]
Re: removing duplicates from an array of hashes by NetWallah (Canon) on Apr 17, 2014 at 05:32 UTC
In-place TIMTOWTDI using 'delete', inspired by bigi (++): `perl -MData::Dumper -E 'my $r=[map { id=>$_ }, ("a".."c","b")]; say Dumper $r; my %h; $h{$r->[$_]{id}}++ and delete $r->[$_] for 0..$#$r; say Dumper $r'` [download] IMHO, kcott's grep (++) is the cleanest, and classic/canonical. What is the sound of Perl? Is it not the sound of a wall that people have stopped banging their heads against? -Larry Wall, 1992	[reply] [d/l]
Re^2: removing duplicates from an array of hashes by bigj (Monk) on Apr 17, 2014 at 06:41 UTC
Disadvantage is that delete works good on hashes, but "bad" (and deprecated) on arrays. It does not really delete an entry but just undefs it (with the exception when it is the last element(s), so it worked in the original example, but if you put e.g. to 'a'-ids at the start of the array, you'll see it). See also the Documention of delete. Greetings, Janek Schleicher PS: I agree that the grep solution is the usual way. Was just interested to write an inplace algorithm, as sometimes that's usefull, too, when working with big data.	[reply]
Re^3: removing duplicates from an array of hashes by NetWallah (Canon) on Apr 17, 2014 at 15:15 UTC
Thanks for pointing out that "delete $array[$idx]" is deprecated, and generates undefs. Just to illustrate the subtle behaviour that was masked in my previous post, here is a demo of potential disaster the appearance of the undef could cause: `perl -MData::Dumper -E 'my $r=[map { id=>$_ }, ("b","a".."c","b")]; say Dumper $r; my %h; $h{$r->[$_]{id}}++ and delete $r->[$_] for 0..$#$r; say Dumper $r' --- SECOND (Relevant) PART of OUTPUT--- $VAR1 = [ { 'id' => 'b' }, { 'id' => 'a' }, undef, { 'id' => 'c' } ];` [download] What is the sound of Perl? Is it not the sound of a wall that people have stopped banging their heads against? -Larry Wall, 1992	[reply] [d/l]
Re: removing duplicates from an array of hashes by vinoth.ree (Monsignor) on Apr 17, 2014 at 07:26 UTC
Hi, Here is another way of removing duplicate elements from array of hash. `use strict; use warnings; use Data::Dumper; my $ref = []; $ref->[0] = { id => 'a' }; $ref->[1] = { id => 'b' }; $ref->[2] = { id => 'c' }; $ref->[3] = { id => 'b' }; #But using temp hash my $temp_hash={}; my @Uniqe_Array_Of_Hash = grep { $_ && ++$temp_hash->{$_->{id}}< 2 } + @$ref; print Dumper \@Uniqe_Array_Of_Hash;` [download] *All is well*	[reply] [d/l]
Re: removing duplicates from an array of hashes by hdb (Monsignor) on Apr 17, 2014 at 10:38 UTC
Using an anonymous hash, not maintaining the original order: `@$ref = values %{{ map { $_->{'id'} => $_ } @$ref }};` [download]	[reply] [d/l]
Re^2: removing duplicates from an array of hashes by Anonymous Monk on Apr 17, 2014 at 13:23 UTC
You perlmonks are seriously amazing! Can't thank you enough!	[reply]