It seems that you can get a lot of improvement. Here are two tests I ran. The first is using two sets each with about half of /usr/dict/words and the third having the word zoo in it.
Union wineglasses zoo
Union wineglasses zoo
(warning: too few iterations for a reliable count)
s/iter sorted unsorted
sorted 4.34 -- -78%
unsorted 0.956 354% --
Intersection wineglasses zoo
Intersection wineglasses zoo
The data for this run is
@a1 = qw (a b c d g);
@a2 = qw (a b c e d);
@a3 = qw (a b c f g);
Intersection a b c
Intersection a b c
Rate unsorted sorted
unsorted 205/s -- -93%
sorted 2775/s 1251% --
Intersection a b c
Intersection a b c
So even with a small data set the sorted method is faster.
For both methods the sets must have unique elements (as all
set do).
Here is the Benchmark code. How the argments are passed may have some effect on the speed. intersection_2 modifies the arrays so I do the copies to make the test more fair.sub intersection_1 { my $count = shift; $count--; my %saw; grep($count == $saw{$_}++, @_); } # This code is based on the SortedMultiList code but works # for an intersection of more than two sets. sub intersection_2 { my $self = [@_]; my ($lowest_arr, $lowest, $next); my @ret = (); my $elements = scalar @_; while (1) { my $count = 0; $lowest = $next; if (!defined($lowest)) { # find min element in queue foreach my $a (@$self) { return @ret unless (@$a); if (!defined($lowest) or ($a->[0] lt $lowest)) { $lowest = $a->[0]; } } } foreach my $a (@$self) { return @ret unless (@$a); if ($a->[0] eq $lowest) { shift @$a; $count++; } else { if (!defined($next) or ($a->[0] lt $next)) { $next = $a->[0]; } } } if ($count == $elements) { push(@ret, $lowest); } $lowest = $next; undef $next; } }
cmpthese -10, {
unsorted => q[
my @x1 = @a1;
my @x2 = @a2;
my @x3 = @a3;
my @out = intersection_1(3, @x1, @x2, @x3);
],
sorted => q[
my @x1 = @a1;
my @x2 = @a2;
my @x3 = @a3;
my @out = intersection_2(\@x1, \@x2, \@x3);
]
};
The SortedMultiList code is at Re: better union of sets algorithm?.
In reply to Re: better intersection of sets algorithm?
by gam3
in thread better union of sets algorithm?
by perrin
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |