Remove Array Duplicates from Array of Arrays

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.

Re: Remove Array Duplicates from Array of Arrays
by tybalt89 (Monsignor) on Sep 01, 2018 at 12:37 UTC

#!/usr/bin/perl

use strict;
use warnings;
use Data::Dumper;

my $a = [ "hello", "day", "sun and fun" ];
my $b = [ 2, "okay", "may" ];
my $c = [ "hello", "day", "sun and fun" ];

my %dup;
my @MyArrayOfArray = grep !$dup{Dumper $_}++, $a, $b, $c;

print Dumper \@MyArrayOfArray;
[download]

Outputs:

$VAR1 = [
          [
            'hello',
            'day',
            'sun and fun'
          ],
          [
            2,
            'okay',
            'may'
          ]
        ];
[download]

Avoids trying to find a unique separator.

[reply]
[d/l]
[select]

Re^2: Remove Array Duplicates from Array of Arrays

by Anonymous Monk on Sep 01, 2018 at 13:10 UTC

This is awesome, thank you.

[reply]

Re: Remove Array Duplicates from Array of Arrays
by Dallaylaen (Chaplain) on Sep 01, 2018 at 10:08 UTC

As others already suggested, you need to (1) use a hash for uniqueness and (2) serialize your arrays to make sure they don't match by chance. Here is one of many possible implementations:

#!/usr/bin/env perl

# https://www.perlmonks.org/?node_id=1221504

use strict;
use warnings;
use Data::Dumper;

my @aoa = (
    [qw[sun and fun]],
    [qw[sunand fun]],
    [qw[sun and fun]],
    [qw[fun and sun]],
);

# first off, serialize the array somehow
# Multiple methods may exist, depending on expected content of arrays
sub concat {
    my $array = shift;
    # we replace \ with \\ and use a literal \n for delimiter,
    #    so no confusion may occur
    return join "\\n", map { s/\\/\\\\/g; $_ } @$array;
};

# Use a hash for uniqueness
# this would've been grep { !$uniq{$_}++ } @aoa if @aoa was just strin
+gs
my %uniq;
my @no_dupes = grep { !$uniq{ concat($_) }++ } @aoa;

# Check the data
print Dumper(\@no_dupes);
print Dumper(\%uniq);
[download]

[reply]
[d/l]

Re: Remove Array Duplicates from Array of Arrays
by Athanasius (Archbishop) on Sep 01, 2018 at 08:34 UTC

Here’s one approach: convert each inner array into a single string, and store it in a hash for future lookup:

use strict;
use warnings;
use Data::Dump;

my @MyArrayOfArray =
(
    [ "hello", "day",         "sun and fun" ],
    [  2,      "okay",        "may"         ],
    [ "hello", "day",         "sun and fun" ],
    [  2,      "okay",        "may"         ],
    [ "hello", "sun and fun", "day"         ],
    [  2,      "okay",        "may"         ],
);

my (%hash, @AoA2);

for (@MyArrayOfArray)
{
    my $key = join '', @$_;
    push @AoA2, $_ unless exists $hash{ $key };
    ++$hash{ $key };
}

dd \@AoA2;
[download]

Output:

18:32 >perl 1924_SoPW.pl
[
  ["hello", "day", "sun and fun"],
  [2, "okay", "may"],
  ["hello", "sun and fun", "day"],
]

18:32 >
[download]

Hope that helps,

Athanasius <°(((>< contra mundum Iustus alius egestas vitae, eros Piratica,

[reply]
[d/l]
[select]

Re^2: Remove Array Duplicates from Array of Arrays

by AnomalousMonk (Archbishop) on Sep 01, 2018 at 09:13 UTC

my $key = join '', @$_;

Note that join-ing with the empty string means that, e.g., [ 'hello', 'sun and fun', 'day' ] cannot be distinguished from the arguably different subarray [ 'hello', 'sun and funday' ] (among other permutations):

c:\@Work\Perl\monks\Anonymous Monk>perl -wMstrict -le
"use Data::Dump;
 ;;
 my @MyArrayOfArray =
 (
     [ 'hello', 'sun and fun', 'day' ],
     [  2,      'okay',        'may' ],
     [ 'hello', 'sun and funday'     ],
     [  2,      'okay',        'may' ],
 );
 ;;
 my (%hash, @AoA2);
 ;;
 for (@MyArrayOfArray)
 {
     my $key = join '', @$_;
     push @AoA2, $_ unless exists $hash{ $key };
     ++$hash{ $key };
 }
 ;;
 dd \@AoA2;
"
[["hello", "sun and fun", "day"], [2, "okay", "may"]]
[download]

join

guaranteed

$;

perlvar

c:\@Work\Perl\monks\Anonymous Monk>perl -wMstrict -le
"use Data::Dump;
 ;;
 my @MyArrayOfArray =
 (
     [ 'hello', 'sun and fun', 'day' ],
     [  2,      'okay',        'may' ],
     [ 'hello', 'sun and funday'     ],
     [  2,      'okay',        'may' ],
 );
 ;;
 my (%hash, @AoA2);
 ;;
 for (@MyArrayOfArray)
 {
     my $key = join $;, @$_;
     push @AoA2, $_ unless exists $hash{ $key };
     ++$hash{ $key };
 }
 ;;
 dd \@AoA2;
"
[
  ["hello", "sun and fun", "day"],
  [2, "okay", "may"],
  ["hello", "sun and funday"],
]
[download]

Update: Slightly more concisely:

c:\@Work\Perl\monks\Anonymous Monk>perl -wMstrict -le
"use Data::Dump;
 ;;
 my @MyArrayOfArray = (
   [ 'hello', 'sun and fun', 'day' ],
   [  2,      'okay',        'may' ],
   [ 'hello', 'sun and funday'     ],
   [  2,      'okay',        'may' ],
   );
 ;;
 my @AoA2 = do {
   my %seen;
   grep ! $seen{ join $;, @$_ }++, @MyArrayOfArray;
   };
 ;;
 dd \@AoA2;
"
[
  ["hello", "sun and fun", "day"],
  [2, "okay", "may"],
  ["hello", "sun and funday"],
]
[download]

(Update: Posted this update before I saw Dallaylaen's post with essentially the same idea.)

Give a man a fish: <%-{-{-{-<

[reply]
[d/l]
[select]

Re: Remove Array Duplicates from Array of Arrays
by Laurent_R (Canon) on Sep 01, 2018 at 08:40 UTC

Another possibility would be to stringify the subarrays and store them in a hash (with the stringified sub-arrays as keys and sub-array references as values). The hash will automatically remove duplicates, so all you need at this point is to collect the hash values and store them in a new AoA.

Update at 8:44 UTC: I had not seen it when I started to post, but the second solution above is more or less equivalent to the solution suggested by Athanasius.

[reply]