Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi perlmonks, I have an array of records where each record is a hash reference with an identifier (and other fields not shown). This array, however, has records with duplicate identifiers, and i need to rebuild the same array of records without the duplicates.
my $ref = []; $ref->[0] = { id => 'a' }; $ref->[1] = { id => 'b' }; $ref->[2] = { id => 'c' }; $ref->[3] = { id => 'b' };
Given the above example, I would like to end up with the following.
$ref->[0] = { id => 'a' }; $ref->[1] = { id => 'b' }; $ref->[2] = { id => 'c' };
It doesn't really matter which 'b' record remains. Any nice perlish solutions out there perhaps using a combination of map and grep ? thanks in advance, Michael

Replies are listed 'Best First'.
Re: removing duplicates from an array of hashes
by kcott (Archbishop) on Apr 17, 2014 at 04:26 UTC

    Here's a grep solution:

    my %seen; @$ref = grep { ! $seen{$_->{id}}++ } @$ref;

    My test is in the spoiler.

    Script:

    #!/usr/bin/env perl use strict; use warnings; use Data::Dump; my $ref = [ map { +{ id => $_ } } qw{a b c b} ]; dd $ref; my %seen; @$ref = grep { ! $seen{$_->{id}}++ } @$ref; dd $ref;

    Output:

    [{ id => "a" }, { id => "b" }, { id => "c" }, { id => "b" }] [{ id => "a" }, { id => "b" }, { id => "c" }]

    -- Ken

Re: removing duplicates from an array of hashes
by bigj (Monk) on Apr 17, 2014 at 04:53 UTC
    As TMTWTDI, I'll add an inplace version:
    my %seen; for my $i (reverse (0 .. @$ref-1)) { # replaces current item in array with last item # and removes the last element if id already seen $ref->[$i] = pop @$ref if $seen{$ref->[$i]->{id}}++; }
    Note: Ordering in the array might change

    Update: inplace version that keeps the ordering

    my %seen; my $removed = 0; for my $i (0 .. @$ref-1) { my $item = $ref->[$i]; $seen{$item->{id}}++ ? $removed++ : ($ref->[$i-$removed] = $item); } splice @$ref,-$removed;

    Greetings,
    Janek Schleicher

      Just a matter of idle curiosity... I notice you use  @$ref-1 instead of  $#$ref for obtaining the max array index. Why do you prefer this? If  $[ were ever changed (but don't do that!),  @$ref-1 would yield an incorrect value for max index. (Although I suppose that one could argue that to be $[-safe, one ought always to iterate over  $[ .. $#array rather than  0 .. $#array to safely visit the entire range of  @array indices.)

      Also, in your second code example
          splice @$ref,-$removed;
      might be written as the (presumably) marginally faster
          $#$ref -= $removed;

      Update: I should make it clear that this post was prompted mainly by the non-idiomatic nature of the  @$ref-1 expression. Any "lack-of-safety" issue is only a kind of cayenne pepper icing on the cake.

        Reasoning is simple:

        I didn't program for a long time (close to 10yrs), I still were sure that $#array is the last item, but I was too lazy to look up, how to note it when it is a $arrayref, so I just took 0..@$arrayref-1 what works same here and for at least where I stopped, it was same idiomatic. Every programmer in the world understand 0..length(array)-1 in a heartbeat, only Perl cracks will understand $[..$#array.

        I wasn't aware that I could just $#array -= $n to make the array smaller. To be honest, I'm not sure whether I'd like it. It works only in so rare cases where we basical want to  pop @array, $n and don't care about what were the last items. splice is a clear idiom that we intend to remove entries out of an array and then specify which. If the special case were more often, o.k., but how often do we even use splice? IMHO that is even rare, most of the time we pop, shift or slice with @array[4..7,11..13] and so on, so no need to trick us selfs just to trick us. Code should be easy.

        Anyway, pretty to cool to have learned some new tricks in Perl :-)

        Greetings,
        Janek Schleicher

Re: removing duplicates from an array of hashes
by atcroft (Abbot) on Apr 17, 2014 at 04:02 UTC

    This was the first thing that popped into my head:

    Code:

    perl -MData::Dumper -le ' my $ref = []; $ref->[0] = { id => "a" }; $ref->[1] = { id => "b" }; $ref->[2] = { id => "c" }; $ref->[3] = { id => "b" }; print Data::Dumper->Dump( [ \$ref, ], [ qw( *ref ) ] ); my $temp; my %seen; while ( my $t = shift @{$ref} ) { if (not defined $seen{$t->{id}}) { push @{$temp}, $t; $seen{$t->{id}}++; } } print Data::Dumper->Dump( [ \$temp, ], [ qw( *temp ) ] ); '
    Output:

    Hope that helps.

Re: removing duplicates from an array of hashes
by NetWallah (Canon) on Apr 17, 2014 at 04:04 UTC
    One liner (Formatted):
    perl -MData::Dumper -E 'my $r=[map { id=>$_ }, ("a".."c","b")]; say Dumper $r; my %h; my @z= map {$h{$_->{id}}++ ?():$_ } @$r; say Dumper \@z'
    Output:
    $VAR1 = [ { 'id' => 'a' }, { 'id' => 'b' }, { 'id' => 'c' }, { 'id' => 'b' } ]; $VAR1 = [ { 'id' => 'a' }, { 'id' => 'b' }, { 'id' => 'c' } ];

            What is the sound of Perl? Is it not the sound of a wall that people have stopped banging their heads against?
                  -Larry Wall, 1992

Re: removing duplicates from an array of hashes
by NetWallah (Canon) on Apr 17, 2014 at 05:32 UTC
    In-place TIMTOWTDI using 'delete', inspired by bigi (++):
    perl -MData::Dumper -E 'my $r=[map { id=>$_ }, ("a".."c","b")]; say Dumper $r; my %h; $h{$r->[$_]{id}}++ and delete $r->[$_] for 0..$#$r; say Dumper $r'
    IMHO, kcott's grep (++) is the cleanest, and classic/canonical.

            What is the sound of Perl? Is it not the sound of a wall that people have stopped banging their heads against?
                  -Larry Wall, 1992

      Disadvantage is that delete works good on hashes, but "bad" (and deprecated) on arrays. It does not really delete an entry but just undefs it (with the exception when it is the last element(s), so it worked in the original example, but if you put e.g. to 'a'-ids at the start of the array, you'll see it). See also the Documention of delete.

      Greetings,
      Janek Schleicher

      PS: I agree that the grep solution is the usual way. Was just interested to write an inplace algorithm, as sometimes that's usefull, too, when working with big data.
        Thanks for pointing out that "delete $array[$idx]" is deprecated, and generates undefs.

        Just to illustrate the subtle behaviour that was masked in my previous post, here is a demo of potential disaster the appearance of the undef could cause:

        perl -MData::Dumper -E 'my $r=[map { id=>$_ }, ("b","a".."c","b")]; say Dumper $r; my %h; $h{$r->[$_]{id}}++ and delete $r->[$_] for 0..$#$r; say Dumper $r' --- SECOND (Relevant) PART of OUTPUT--- $VAR1 = [ { 'id' => 'b' }, { 'id' => 'a' }, undef, { 'id' => 'c' } ];

                What is the sound of Perl? Is it not the sound of a wall that people have stopped banging their heads against?
                      -Larry Wall, 1992

Re: removing duplicates from an array of hashes
by vinoth.ree (Monsignor) on Apr 17, 2014 at 07:26 UTC

    Hi,

    Here is another way of removing duplicate elements from array of hash.

    use strict; use warnings; use Data::Dumper; my $ref = []; $ref->[0] = { id => 'a' }; $ref->[1] = { id => 'b' }; $ref->[2] = { id => 'c' }; $ref->[3] = { id => 'b' }; #But using temp hash my $temp_hash={}; my @Uniqe_Array_Of_Hash = grep { $_ && ++$temp_hash->{$_->{id}}< 2 } + @$ref; print Dumper \@Uniqe_Array_Of_Hash;

    All is well
Re: removing duplicates from an array of hashes
by hdb (Monsignor) on Apr 17, 2014 at 10:38 UTC

    Using an anonymous hash, not maintaining the original order:

    @$ref = values %{{ map { $_->{'id'} => $_ } @$ref }};
      You perlmonks are seriously amazing! Can't thank you enough!