simon.proctor has asked for the wisdom of the Perl Monks concerning the following question:

I need to remove all unique entries from a list of lists such that all lists are the same. I have done this but I'm sure there must be a better way?

Data structure for array paring example
use strict; use warnings; use Data::Dumper; my $temp = [ [ '/', '/index.html', '/products/' ], [ '/', '/index.html', ], [ '/', '/index.html', '/products/' ], [ '/', '/index.html', '/test.html', ] ]; # Count for the smallest array my $size = @{$temp->[0]}; my $numterms = @{$temp}; # Build the initial list my $seen = {}; map{ $seen->{$_}++} @{$temp->[0]}; my $success = 1; for(my $i = 1; $i < @{$temp}; $i++) { map { $seen->{$_}++ if exists($seen->{$_});}(@{$temp->[$i]}); } my @results; while(my ($key,$value) = each(%{$seen})) { if($value == $numterms) { push @results,$key; } } print STDERR Data::Dumper->Dump([\@results]);
Note that the datastructure is typically two to three times the size of the one above. The code prints (when run):

C:\WINNT\PROFILES\PROCTOS\DESKTOP>perl test.pl $VAR1 = [ '/', '/index.html' ]; C:\WINNT\PROFILES\PROCTOS\DESKTOP>


I have seen code for returning all unique elements for lists but not to do this.

Thanks
Simon

Replies are listed 'Best First'.
Re: Remove unique elements from a list of lists
by PrakashK (Pilgrim) on Feb 27, 2002 at 14:44 UTC
    I need to remove all unique entries from a list of lists such that all lists are the same.
    From what you have done, it seems to me like you want to get the elements which exist in all lists.

    I can't suggest a better way, but it can be shorter be removing unnecessary code.

    use strict; use warnings; use Data::Dumper; my $temp = ... # your list of lists here my $numterms = @$temp; my $seen = {}; for my $term (@$temp) { map { $seen->{$_}++ } @$term; } my @results = grep {$seen->{$_} == $numterms} keys %$seen; print STDERR Data::Dumper->Dump([\@results]);
    /prakash
(jeffa) Re: Remove unique elements from a list of lists
by jeffa (Bishop) on Feb 27, 2002 at 15:03 UTC
    Try this:
    my $temp = ...; my @compare = map { join('|',sort @$_) } sort @$temp; my %unique; @unique{@compare} = (1) x @compare; $temp = [ map {[ split /\|/, $_ ]} keys %unique ]; print Dumper $temp;
    The idea is to create a sorted array of strings and compare those instead. Once duplicates are 'removed', you split the data back out. The key is to use a delimiter that will not show up in your data.

    jeffa

    L-LL-L--L-LL-L--L-LL-L--
    -R--R-RR-R--R-RR-R--R-RR
    B--B--B--B--B--B--B--B--
    H---H---H---H---H---H---
    (the triplet paradiddle with high-hat)
    
Re: Remove unique elements from a list of lists
by particle (Vicar) on Feb 27, 2002 at 15:02 UTC
    note: this won't work if your data structure is deeper, only if it is wider

    i'm making flattening the array of arrays to an array with duplicate entries, creating a hash with values of the array of arrays as keys, and their count as the hash values.

    then i'm iterating over the hash, and if the value equals the length of the array, it appears in every sub-array, so i print it.

    #!/usr/local/bin/perl -w use strict; use diagnostics; $|++; my $temp = [ [ '/', '/index.html', '/products/', ], [ '/', '/index.html', ], [ '/', '/index.html', '/products/', ], [ '/', '/index.html', '/test.html', ] ]; my ( $a_ref, $h_ref ); for $a_ref ( @{ $temp } ) { $h_ref->{ $_ }++ for @{ $a_ref }; } for( keys %{ $h_ref } ) { print "$_\n" if $h_ref->{ $_ } == scalar @{ $temp }; }

    ~Particle ;Þ

Re: Remove unique elements from a list of lists
by simon.proctor (Vicar) on Feb 27, 2002 at 15:27 UTC
    Well the current benchmarks are in (if I did it right :P )

    Update: added Zaxos code.
    Benchmark: timing 100000 iterations of jeffa, zaxo, original, particle +, prakash... jeffa: 7 wallclock secs ( 6.36 usr + 0.00 sys = 6.36 CPU) @ 15 +725.74/s (n=100000) zaxo: 6 wallclock secs ( 5.15 usr + 0.00 sys = 5.15 CPU) @ 194 +28.79/s (n=100000) original: 4 wallclock secs ( 3.98 usr + 0.00 sys = 3.98 CPU) @ 25 +150.91/s (n=100000) particle: 4 wallclock secs ( 3.83 usr + 0.00 sys = 3.83 CPU) @ 26 +136.96/s (n=100000) prakash: 4 wallclock secs ( 3.62 usr + 0.00 sys = 3.62 CPU) @ 27 +654.87/s (n=100000)
    So prakash is still in the lead :). I had to modify Zaxos code slightly to make a local copy of the array passed as its use of shift is destructive (so calling it more than once wasn't originally possible).
      Have you tried with larger input data. Whilst many itterations is a good quick test I think what you are dealing with is a n^2 problem (maybe higher) and so a larger initial data set may spread the results more and give a clearer identification of the best algorithm

      ---If it doesn't fit use a bigger hammer
Re: Remove unique elements from a list of lists
by Zaxo (Archbishop) on Feb 27, 2002 at 15:26 UTC

    If I understand correctly, you want the list of elements common to all, right?. There is a fairly simple way to do that.

    The list must be a subset of any row, so start with the first and make a hash of it, called %common. Look at the next row, and delete any key in %common which is not there. Lather, rinse repeat. Here's the code:

    sub common { my $num = @_; my (%common, %test); $_ = shift; @common{@$_} = () x @$_; # second use is scalar context while ( $_ = shift) { %test = (); @test{@$_} = () x @$_; delete @common{ grep { ! exists $test{$_} } keys %common}; } [ [keys %common] x $num ]; }
    delete takes a list of arguments, which is provided by the slice over %common. The keys for that slicee are selected by grep as those which are not present in the current array, as represented by keys %test.

    This approach discards ordering.

    Update: Modified code to return ref to AoA corresponding to $test of the root node. U2 fixed typo in code, parens for curlies in %common init, may save some time.

    After Compline,
    Zaxo