crenz has asked for the wisdom of the Perl Monks concerning the following question:

On a mailing list, somebody needed help with creating the intersection of several lists -- that is, creating a list of elements that are in all given lists. I came up with the following code. What further optimizations would be possible?

Note that I'm not looking for perl speed tricks, but rather algorithmic enhancements. I used the subroutines to make the main algorithm more readable; also, the implementation is not very perlish since that guy wanted to use it for a C++ library.

use strict; use warnings; my @sets = ([ 0, 1, 2, 3, 7, 8, 9, 10, 11, 12, 13, 14, 15, ], [ 0, 3, 7, 8, 10, 11, 13, 14, 16, ], [ 0, 1, 2, 3, 5, 7, 8, 10, 11, 12, 13, 14, 16, ], [ 2, 3, 4, 6, 7, 8, 10, 11, 12, 14, 16, ], [ 0, 1, 2, 3, 7, 8, 9, 10, 11, 13, 14, 15, ]); my $numSets = scalar @sets; my @intersection = (); my $currentSet = 0; my @marker = (0, 0, 0, 0, 0); my $currentItem; # helper functions sub stepMarker { $marker[$currentSet]++; if ($marker[$currentSet] >= scalar @{$sets[$currentSet]}) { # we are at the end of the current set # => abort, there cannot be any more common items last MAIN; } } sub nextSet { $currentSet++; $currentSet %= $numSets; } sub getCurrentItem { return $sets[$currentSet]->[$marker[$currentSet]]; } # main intersection algorithm my $activeItem = getCurrentItem; my $isPresent = 1; MAIN: while (1) { stepMarker(); nextSet; while (($currentItem = getCurrentItem()) < $activeItem) { stepMarker(); } if ($currentItem > $activeItem) { $activeItem = $currentItem; $isPresent = 1; } else { $isPresent++; if ($isPresent == $numSets) { # item has been found present in all sets push @intersection, $activeItem; stepMarker(); $activeItem = getCurrentItem; $isPresent = 1; } } } print "Common items in the $numSets sets: @intersection\n";

Replies are listed 'Best First'.
Re: Algorithm Golfing: Intersection of lists
by Zaxo (Archbishop) on Jan 07, 2003 at 00:46 UTC

    The Perl FAQ algorithm for intersection of two sets can be applied multiple times:

    #!/usr/bin/perl use warnings; use strict; my @sets = ([ 0, 1, 2, 3, 7, 8, 9, 10, 11, 12, 13, 14, 15, ], [ 0, 3, 7, 8, 10, 11, 13, 14, 16, ], [ 0, 1, 2, 3, 5, 7, 8, 10, 11, 12, 13, 14, 16, ], [ 2, 3, 4, 6, 7, 8, 10, 11, 12, 14, 16, ], [ 0, 1, 2, 3, 7, 8, 9, 10, 11, 13, 14, 15, ]); my $numsets = @sets; my %intersection; @intersection{ @{$sets[0]} } = (); for ( @sets[1..$#sets] ) { my %this; @this{ @$_ } = (); delete @intersection{ grep { ! exists $this{$_) } keys %intersection}; } my @intersection = sort {$a <=> $b} keys %intersection; print "Common items in the $numsets sets: @intersection\n"; __END__ prints: Common items in the 5 sets: 3 7 8 10 11 14
    The algorithmic gains here are in the lookup efficiency of hashes and the shrinkage of keys %intersection as keys are deleted.

    In C++ the STL map class template is available to act like a perl hash.

    After Compline,
    Zaxo

use Quantum::Superpositions (was Re: Algorithm Golfing: Intersection of lists)
by dragonchild (Archbishop) on Jan 07, 2003 at 15:18 UTC
    use Quantum::Superpositions; my @sets = ([ 0, 1, 2, 3, 7, 8, 9, 10, 11, 12, 13, 14, 15, ], [ 0, 3, 7, 8, 10, 11, 13, 14, 16, ], [ 0, 1, 2, 3, 5, 7, 8, 10, 11, 12, 13, 14, 16, ], [ 2, 3, 4, 6, 7, 8, 10, 11, 12, 14, 16, ], [ 0, 1, 2, 3, 7, 8, 9, 10, 11, 13, 14, 15, ]); my @common = sort {$a<=>$b}eigenstates( all( map { any(@$_) } @sets ) +); print "@common\n";

    ------
    We are the carpenters and bricklayers of the Information Age.

    Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.

Re: Algorithm Golfing: Intersection of lists
by blokhead (Monsignor) on Jan 07, 2003 at 00:01 UTC
    Here's an algorithm that performs the operation, in a slightly more Perl-ish manner (heavy use of shift), but destroys the copies of the lists. It could easily be modified not to do so, by using a list of array offsets, but this should give you a start. In any case, it's just a heck of a lot easier in Perl to shift and always check the first element than to try to keep track of a list of array offsets.

    I don't know if it's optimal, but I'm happy with it:

    use Data::Dumper; my @sets = ([ 0, 1, 2, 3, 7, 8, 9, 10, 11, 12, 13, 14, 15, ], [ 0, 3, 7, 8, 10, 11, 13, 14, 16, ], [ 0, 1, 2, 3, 5, 7, 8, 10, 11, 12, 13, 14, 16, ], [ 2, 3, 4, 6, 7, 8, 10, 11, 12, 14, 16, ], [ 0, 1, 2, 3, 7, 8, 9, 10, 11, 13, 14, 15, ]); my @common = (); ### helpers sub max_element { my $max = -1; for (@sets) { $max = $_->[0] if $_->[0] > $max; } return $max; } sub all_the_same { my $element; for (@sets) { if (not defined $element) { $element = $_->[0]; } else { return 0 if $element != $_->[0]; } } return 1; } ### main loop my $done = 0; while (not $done) { foreach (@sets) { $done++ if @$_ == 0; } if (all_the_same()) { push @common, $sets[0][0]; shift @$_ for @sets; } else { my $max = max_element(); for (@sets) { shift @$_ if $_->[0] < $max; } } ## uncomment these to see how it works step by step: # print Dumper(\@sets, \@common), "\n"; # <STDIN> } print "@common\n";
    Here's a pseudocode breakdown of the algorithm, for illustrative purposes.
    common_elements ::= empty list; while all lists are non-empty; do if the first elements of all the lists are the same add that element to common_elements else max_element ::= maximum first element of all lists foreach list; do if first element of list < max_element; shift it off the list endif done
    Works kinda like a merge-sort by moving down all the lists in parallel. I don't know if that's what yours is trying to do. I can't quite grok it at the moment. Update: After adding some Dumper statements to your code, it seems like yours is doing a very similar algorithm, as you said, in a very C++ way. I'm not so sure what to think about the last MAIN; statement you have there. ;-)

    There is at least one part where code could be improved: the max_element sub could also return information about which arrays had non-maximum elements, so those arrays could easily be shifted. To do this cleanly and efficiently requires more cleverness than I have, however.

    HTH,

    blokhead

Re: Algorithm Golfing: Intersection of lists
by waswas-fng (Curate) on Jan 07, 2003 at 00:23 UTC
    my @sets = ([ 0, 1, 2, 3, 7, 8, 9, 10, 11, 12, 13, 14, 15, ], [ 0, 3, 7, 8, 10, 11, 13, 14, 16, ], [ 0, 1, 2, 3, 5, 7, 8, 10, 11, 12, 13, 14, 16, ], [ 2, 3, 4, 6, 7, 8, 10, 11, 12, 14, 16, ], [ 0, 1, 2, 3, 7, 8, 9, 10, 11, 13, 14, 15, ]); foreach $la (@sets) { %c= map { $_, $c{$_} += 1; } @$la; } foreach $la (sort { $a <=> $b } keys %c) { print "$la is in all sets.\n" if $c{$la} > $#sets; }


    -Waswas
Re: Algorithm Golfing: Intersection of lists
by gjb (Vicar) on Jan 07, 2003 at 00:59 UTC

    Yet another way to do it:

    #!perl use strict; use warnings; my @sets = ([ 0, 1, 2, 3, 7, 8, 9, 10, 11, 12, 13, 14, 15, ], [ 0, 3, 7, 8, 10, 11, 13, 14, 16, ], [ 0, 1, 2, 3, 5, 7, 8, 10, 11, 12, 13, 14, 16, ], [ 2, 3, 4, 6, 7, 8, 10, 11, 12, 14, 16, ], [ 0, 1, 2, 3, 7, 8, 9, 10, 11, 13, 14, 15, ]); my @intersection; print join("\n", map(join(" ", @$_), @sets)), "\n\n"; while (@sets == grep {@$_ > 0} @sets) { @sets = sort {$a->[0] <=> $b->[0]} @sets; if ($sets[0]->[0] == $sets[-1]->[0]) { push(@intersection, $sets[0]->[0]); shift(@$_) for @sets; } else { my @ne = grep {$sets[0]->[0] != $_->[0]} @sets; my @eq = grep {$sets[0]->[0] == $_->[0]} @sets; shift(@$_) for @eq; @sets = (@eq, @ne); } } print join(" ", @intersection), "\n";

    Just my 2 cents, -gjb-

Re: Algorithm Golfing: Intersection of lists
by jdporter (Paladin) on Jan 07, 2003 at 04:50 UTC
    Man, you people sure do like to make things complicated.

    Here's a function (I'm inclined to say the function, but this is Perl, so, TIMTOWTDI) which computes the list.
    sub common_subset { my $a = shift; @_ or return @$a; my %h; @h{@$a} = (); grep { exists $h{$_} } common_subset( @_ ) }
    You'd use it like this:
    my @sets = ( [ 0, 1, 2, 3, 7, 8, 9, 10, 11, 12, 13, 14, 15, ], [ 0, 3, 7, 8, 10, 11, 13, 14, 16, ], [ 0, 1, 2, 3, 5, 7, 8, 10, 11, 12, 13, 14, 16, ], [ 2, 3, 4, 6, 7, 8, 10, 11, 12, 14, 16, ], [ 0, 1, 2, 3, 7, 8, 9, 10, 11, 13, 14, 15, ], ); my @l = common_subset( @sets );
    (This solution should be apparent to anyone who's read SICP.)

    jdporter
    The 6th Rule of Perl Club is -- There is no Rule #6.