Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Dear monks

I have an array of array @MyArrayOfArray, each element is an array containing x elements with textual values (in the example below x=3, but it may be bigger (<20), all arrays have the same number of elements). I need to eliminate all exact duplicates of the arrays. What is the best way to achieve this?

my @MyArrayOfArray; my $a = [ "hello", "day", "sun and fun" ]; my $b = [ 2, "okay", "may" ]; my $c = [ "hello", "day", "sun and fun" ]; push (@MyArrayOfArray, $a); push (@MyArrayOfArray, $b); push (@MyArrayOfArray, $c);

Replies are listed 'Best First'.
Re: Remove Array Duplicates from Array of Arrays
by tybalt89 (Monsignor) on Sep 01, 2018 at 12:37 UTC
    #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my $a = [ "hello", "day", "sun and fun" ]; my $b = [ 2, "okay", "may" ]; my $c = [ "hello", "day", "sun and fun" ]; my %dup; my @MyArrayOfArray = grep !$dup{Dumper $_}++, $a, $b, $c; print Dumper \@MyArrayOfArray;

    Outputs:

    $VAR1 = [ [ 'hello', 'day', 'sun and fun' ], [ 2, 'okay', 'may' ] ];

    Avoids trying to find a unique separator.

      This is awesome, thank you.

Re: Remove Array Duplicates from Array of Arrays
by Dallaylaen (Chaplain) on Sep 01, 2018 at 10:08 UTC

    As others already suggested, you need to (1) use a hash for uniqueness and (2) serialize your arrays to make sure they don't match by chance. Here is one of many possible implementations:

    #!/usr/bin/env perl # https://www.perlmonks.org/?node_id=1221504 use strict; use warnings; use Data::Dumper; my @aoa = ( [qw[sun and fun]], [qw[sunand fun]], [qw[sun and fun]], [qw[fun and sun]], ); # first off, serialize the array somehow # Multiple methods may exist, depending on expected content of arrays sub concat { my $array = shift; # we replace \ with \\ and use a literal \n for delimiter, # so no confusion may occur return join "\\n", map { s/\\/\\\\/g; $_ } @$array; }; # Use a hash for uniqueness # this would've been grep { !$uniq{$_}++ } @aoa if @aoa was just strin +gs my %uniq; my @no_dupes = grep { !$uniq{ concat($_) }++ } @aoa; # Check the data print Dumper(\@no_dupes); print Dumper(\%uniq);
Re: Remove Array Duplicates from Array of Arrays
by Athanasius (Archbishop) on Sep 01, 2018 at 08:34 UTC

    Here’s one approach: convert each inner array into a single string, and store it in a hash for future lookup:

    use strict; use warnings; use Data::Dump; my @MyArrayOfArray = ( [ "hello", "day", "sun and fun" ], [ 2, "okay", "may" ], [ "hello", "day", "sun and fun" ], [ 2, "okay", "may" ], [ "hello", "sun and fun", "day" ], [ 2, "okay", "may" ], ); my (%hash, @AoA2); for (@MyArrayOfArray) { my $key = join '', @$_; push @AoA2, $_ unless exists $hash{ $key }; ++$hash{ $key }; } dd \@AoA2;

    Output:

    18:32 >perl 1924_SoPW.pl [ ["hello", "day", "sun and fun"], [2, "okay", "may"], ["hello", "sun and fun", "day"], ] 18:32 >

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

      my $key = join '', @$_;

      Note that join-ing with the empty string means that, e.g.,  [ 'hello', 'sun and fun', 'day' ] cannot be distinguished from the arguably different subarray  [ 'hello', 'sun and funday' ] (among other permutations):

      c:\@Work\Perl\monks\Anonymous Monk>perl -wMstrict -le "use Data::Dump; ;; my @MyArrayOfArray = ( [ 'hello', 'sun and fun', 'day' ], [ 2, 'okay', 'may' ], [ 'hello', 'sun and funday' ], [ 2, 'okay', 'may' ], ); ;; my (%hash, @AoA2); ;; for (@MyArrayOfArray) { my $key = join '', @$_; push @AoA2, $_ unless exists $hash{ $key }; ++$hash{ $key }; } ;; dd \@AoA2; " [["hello", "sun and fun", "day"], [2, "okay", "may"]]
      A join string that is guaranteed not to appear in any text will avoid this. Here I use  $; (see perlvar), the default value of which just happens to work in this particular case:
      c:\@Work\Perl\monks\Anonymous Monk>perl -wMstrict -le "use Data::Dump; ;; my @MyArrayOfArray = ( [ 'hello', 'sun and fun', 'day' ], [ 2, 'okay', 'may' ], [ 'hello', 'sun and funday' ], [ 2, 'okay', 'may' ], ); ;; my (%hash, @AoA2); ;; for (@MyArrayOfArray) { my $key = join $;, @$_; push @AoA2, $_ unless exists $hash{ $key }; ++$hash{ $key }; } ;; dd \@AoA2; " [ ["hello", "sun and fun", "day"], [2, "okay", "may"], ["hello", "sun and funday"], ]

      Update: Slightly more concisely:

      c:\@Work\Perl\monks\Anonymous Monk>perl -wMstrict -le "use Data::Dump; ;; my @MyArrayOfArray = ( [ 'hello', 'sun and fun', 'day' ], [ 2, 'okay', 'may' ], [ 'hello', 'sun and funday' ], [ 2, 'okay', 'may' ], ); ;; my @AoA2 = do { my %seen; grep ! $seen{ join $;, @$_ }++, @MyArrayOfArray; }; ;; dd \@AoA2; " [ ["hello", "sun and fun", "day"], [2, "okay", "may"], ["hello", "sun and funday"], ]
      (Update: Posted this update before I saw Dallaylaen's post with essentially the same idea.)


      Give a man a fish:  <%-{-{-{-<

Re: Remove Array Duplicates from Array of Arrays
by Laurent_R (Canon) on Sep 01, 2018 at 08:40 UTC
    One possibility would be to sort your AoA according to the items in your sub-arrays, and then to walk through the AoA and, for each sub-array, check whether it is equal to the previous (or next) one.

    Another possibility would be to stringify the subarrays and store them in a hash (with the stringified sub-arrays as keys and sub-array references as values). The hash will automatically remove duplicates, so all you need at this point is to collect the hash values and store them in a new AoA.

    Update at 8:44 UTC: I had not seen it when I started to post, but the second solution above is more or less equivalent to the solution suggested by Athanasius.