Algorithm Golfing: Intersection of lists

crenz has asked for the wisdom of the Perl Monks concerning the following question:

On a mailing list, somebody needed help with creating the intersection of several lists -- that is, creating a list of elements that are in all given lists. I came up with the following code. What further optimizations would be possible?

Note that I'm not looking for perl speed tricks, but rather algorithmic enhancements. I used the subroutines to make the main algorithm more readable; also, the implementation is not very perlish since that guy wanted to use it for a C++ library.

use strict;
use warnings;

my @sets =
    ([ 0, 1, 2, 3,          7, 8, 9, 10, 11, 12, 13, 14, 15,     ],
     [ 0,       3,          7, 8,    10, 11,     13, 14,     16, ],
     [ 0, 1, 2, 3,    5,    7, 8,    10, 11, 12, 13, 14,     16, ],
     [       2, 3, 4,    6, 7, 8,    10, 11, 12,     14,     16, ],
     [ 0, 1, 2, 3,          7, 8, 9, 10, 11,     13, 14, 15,     ]);

my $numSets = scalar @sets;
my @intersection = ();
my $currentSet = 0;
my @marker = (0, 0, 0, 0, 0);
my $currentItem;

# helper functions 

sub stepMarker {
    $marker[$currentSet]++;
    if ($marker[$currentSet] >= scalar @{$sets[$currentSet]}) {
        # we are at the end of the current set
        # => abort, there cannot be any more common items
        last MAIN;
    }
}

sub nextSet {
    $currentSet++;
    $currentSet %= $numSets;
}

sub getCurrentItem {
    return $sets[$currentSet]->[$marker[$currentSet]];
}


# main intersection algorithm

my $activeItem = getCurrentItem;
my $isPresent = 1;
MAIN: while (1) {
    stepMarker();
    nextSet;

    while (($currentItem = getCurrentItem()) < $activeItem) {
        stepMarker();
    }

    if ($currentItem > $activeItem) {
        $activeItem = $currentItem;
        $isPresent = 1;
    } else {
        $isPresent++;

        if ($isPresent == $numSets) {
            # item has been found present in all sets
            push @intersection, $activeItem;
            stepMarker();
            $activeItem = getCurrentItem;
            $isPresent = 1;
        }
    }
}

print "Common items in the $numSets sets: @intersection\n";
[download]

Comment on Algorithm Golfing: Intersection of lists Download Code

Replies are listed 'Best First'.
Re: Algorithm Golfing: Intersection of lists by Zaxo (Archbishop) on Jan 07, 2003 at 00:46 UTC
The Perl FAQ algorithm for intersection of two sets can be applied multiple times: #!/usr/bin/perl use warnings; use strict; my @sets = ([ 0, 1, 2, 3, 7, 8, 9, 10, 11, 12, 13, 14, 15, ], [ 0, 3, 7, 8, 10, 11, 13, 14, 16, ], [ 0, 1, 2, 3, 5, 7, 8, 10, 11, 12, 13, 14, 16, ], [ 2, 3, 4, 6, 7, 8, 10, 11, 12, 14, 16, ], [ 0, 1, 2, 3, 7, 8, 9, 10, 11, 13, 14, 15, ]); my $numsets = @sets; my %intersection; @intersection{ @{$sets[0]} } = (); for ( @sets[1..$#sets] ) { my %this; @this{ @$_ } = (); delete @intersection{ grep { ! exists $this{$_) } keys %intersection}; } my @intersection = sort {$a <=> $b} keys %intersection; print "Common items in the $numsets sets: @intersection\n"; __END__ prints: Common items in the 5 sets: 3 7 8 10 11 14 [download] The algorithmic gains here are in the lookup efficiency of hashes and the shrinkage of `keys %intersection` as keys are deleted. In C++ the STL map class template is available to act like a perl hash. After Compline, Zaxo	[reply] [d/l]
use Quantum::Superpositions (was Re: Algorithm Golfing: Intersection of lists) by dragonchild (Archbishop) on Jan 07, 2003 at 15:18 UTC
`use Quantum::Superpositions; my @sets = ([ 0, 1, 2, 3, 7, 8, 9, 10, 11, 12, 13, 14, 15, ], [ 0, 3, 7, 8, 10, 11, 13, 14, 16, ], [ 0, 1, 2, 3, 5, 7, 8, 10, 11, 12, 13, 14, 16, ], [ 2, 3, 4, 6, 7, 8, 10, 11, 12, 14, 16, ], [ 0, 1, 2, 3, 7, 8, 9, 10, 11, 13, 14, 15, ]); my @common = sort {$a<=>$b}eigenstates( all( map { any(@$_) } @sets ) +); print "@common\n";` [download] ------ We are the carpenters and bricklayers of the Information Age. Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.	[reply] [d/l]
Re: Algorithm Golfing: Intersection of lists by blokhead (Monsignor) on Jan 07, 2003 at 00:01 UTC
Here's an algorithm that performs the operation, in a slightly more Perl-ish manner (heavy use of `shift`), but destroys the copies of the lists. It could easily be modified not to do so, by using a list of array offsets, but this should give you a start. In any case, it's just a heck of a lot easier in Perl to `shift` and always check the first element than to try to keep track of a list of array offsets. I don't know if it's optimal, but I'm happy with it: use Data::Dumper; my @sets = ([ 0, 1, 2, 3, 7, 8, 9, 10, 11, 12, 13, 14, 15, ], [ 0, 3, 7, 8, 10, 11, 13, 14, 16, ], [ 0, 1, 2, 3, 5, 7, 8, 10, 11, 12, 13, 14, 16, ], [ 2, 3, 4, 6, 7, 8, 10, 11, 12, 14, 16, ], [ 0, 1, 2, 3, 7, 8, 9, 10, 11, 13, 14, 15, ]); my @common = (); ### helpers sub max_element { my $max = -1; for (@sets) { $max = $_->[0] if $_->[0] > $max; } return $max; } sub all_the_same { my $element; for (@sets) { if (not defined $element) { $element = $_->[0]; } else { return 0 if $element != $_->[0]; } } return 1; } ### main loop my $done = 0; while (not $done) { foreach (@sets) { $done++ if @$_ == 0; } if (all_the_same()) { push @common, $sets[0][0]; shift @$_ for @sets; } else { my $max = max_element(); for (@sets) { shift @$_ if $_->[0] < $max; } } ## uncomment these to see how it works step by step: # print Dumper(\@sets, \@common), "\n"; # <STDIN> } print "@common\n"; [download] Here's a pseudocode breakdown of the algorithm, for illustrative purposes. `common_elements ::= empty list; while all lists are non-empty; do if the first elements of all the lists are the same add that element to common_elements else max_element ::= maximum first element of all lists foreach list; do if first element of list < max_element; shift it off the list endif done` [download] Works kinda like a merge-sort by moving down all the lists in parallel. I don't know if that's what yours is trying to do. I can't quite grok it at the moment. Update: After adding some `Dumper` statements to your code, it seems like yours is doing a very similar algorithm, as you said, in a very C++ way. I'm not so sure what to think about the `last MAIN;` statement you have there. ;-) There is at least one part where code could be improved: the `max_element` sub could also return information about which arrays had non-maximum elements, so those arrays could easily be shifted. To do this cleanly and efficiently requires more cleverness than I have, however. HTH, blokhead	[reply] [d/l] [select]
Re: Algorithm Golfing: Intersection of lists by waswas-fng (Curate) on Jan 07, 2003 at 00:23 UTC
`my @sets = ([ 0, 1, 2, 3, 7, 8, 9, 10, 11, 12, 13, 14, 15, ], [ 0, 3, 7, 8, 10, 11, 13, 14, 16, ], [ 0, 1, 2, 3, 5, 7, 8, 10, 11, 12, 13, 14, 16, ], [ 2, 3, 4, 6, 7, 8, 10, 11, 12, 14, 16, ], [ 0, 1, 2, 3, 7, 8, 9, 10, 11, 13, 14, 15, ]); foreach $la (@sets) { %c= map { $_, $c{$_} += 1; } @$la; } foreach $la (sort { $a <=> $b } keys %c) { print "$la is in all sets.\n" if $c{$la} > $#sets; }` [download] -Waswas	[reply] [d/l]
Re: Algorithm Golfing: Intersection of lists by gjb (Vicar) on Jan 07, 2003 at 00:59 UTC
Yet another way to do it: #!perl use strict; use warnings; my @sets = ([ 0, 1, 2, 3, 7, 8, 9, 10, 11, 12, 13, 14, 15, ], [ 0, 3, 7, 8, 10, 11, 13, 14, 16, ], [ 0, 1, 2, 3, 5, 7, 8, 10, 11, 12, 13, 14, 16, ], [ 2, 3, 4, 6, 7, 8, 10, 11, 12, 14, 16, ], [ 0, 1, 2, 3, 7, 8, 9, 10, 11, 13, 14, 15, ]); my @intersection; print join("\n", map(join(" ", @$_), @sets)), "\n\n"; while (@sets == grep {@$_ > 0} @sets) { @sets = sort {$a->[0] <=> $b->[0]} @sets; if ($sets[0]->[0] == $sets[-1]->[0]) { push(@intersection, $sets[0]->[0]); shift(@$_) for @sets; } else { my @ne = grep {$sets[0]->[0] != $_->[0]} @sets; my @eq = grep {$sets[0]->[0] == $_->[0]} @sets; shift(@$_) for @eq; @sets = (@eq, @ne); } } print join(" ", @intersection), "\n"; [download] Just my 2 cents, -gjb-	[reply] [d/l]
Re: Algorithm Golfing: Intersection of lists by jdporter (Paladin) on Jan 07, 2003 at 04:50 UTC
Man, you people sure do like to make things complicated. Here's a function (I'm inclined to say the function, but this is Perl, so, TIMTOWTDI) which computes the list. `sub common_subset { my $a = shift; @_ or return @$a; my %h; @h{@$a} = (); grep { exists $h{$_} } common_subset( @_ ) }` [download] You'd use it like this: `my @sets = ( [ 0, 1, 2, 3, 7, 8, 9, 10, 11, 12, 13, 14, 15, ], [ 0, 3, 7, 8, 10, 11, 13, 14, 16, ], [ 0, 1, 2, 3, 5, 7, 8, 10, 11, 12, 13, 14, 16, ], [ 2, 3, 4, 6, 7, 8, 10, 11, 12, 14, 16, ], [ 0, 1, 2, 3, 7, 8, 9, 10, 11, 13, 14, 15, ], ); my @l = common_subset( @sets );` [download] (This solution should be apparent to anyone who's read SICP.) jdporter The 6th Rule of Perl Club is -- There is no Rule #6.	[reply] [d/l] [select]