This Code is for intersection for union code see Re^3: better union of sets algorithm?

It seems that you can get a lot of improvement. Here are two tests I ran. The first is using two sets each with about half of /usr/dict/words and the third having the word zoo in it.

Union wineglasses zoo
Union wineglasses zoo
            (warning: too few iterations for a reliable count)
         s/iter   sorted unsorted
sorted     4.34       --     -78%
unsorted  0.956     354%       --
Intersection wineglasses zoo
Intersection wineglasses zoo
The data for this run is
@a1 = qw (a b c d g);
@a2 = qw (a b c e d);
@a3 = qw (a b c f g);

Intersection a b c
Intersection a b c
           Rate unsorted   sorted
unsorted  205/s       --     -93%
sorted   2775/s    1251%       --
Intersection a b c
Intersection a b c
So even with a small data set the sorted method is faster. For both methods the sets must have unique elements (as all set do).
sub intersection_1 { my $count = shift; $count--; my %saw; grep($count == $saw{$_}++, @_); } # This code is based on the SortedMultiList code but works # for an intersection of more than two sets. sub intersection_2 { my $self = [@_]; my ($lowest_arr, $lowest, $next); my @ret = (); my $elements = scalar @_; while (1) { my $count = 0; $lowest = $next; if (!defined($lowest)) { # find min element in queue foreach my $a (@$self) { return @ret unless (@$a); if (!defined($lowest) or ($a->[0] lt $lowest)) { $lowest = $a->[0]; } } } foreach my $a (@$self) { return @ret unless (@$a); if ($a->[0] eq $lowest) { shift @$a; $count++; } else { if (!defined($next) or ($a->[0] lt $next)) { $next = $a->[0]; } } } if ($count == $elements) { push(@ret, $lowest); } $lowest = $next; undef $next; } }
Here is the Benchmark code. How the argments are passed may have some effect on the speed. intersection_2 modifies the arrays so I do the copies to make the test more fair.
cmpthese -10, {
    unsorted => q[
        my @x1 = @a1;
        my @x2 = @a2;
        my @x3 = @a3;
        my @out = intersection_1(3, @x1, @x2, @x3);
    ],
    sorted => q[
        my @x1 = @a1;
        my @x2 = @a2;
        my @x3 = @a3;
        my @out = intersection_2(\@x1, \@x2, \@x3);
    ]
};
The SortedMultiList code is at Re: better union of sets algorithm?.
A picture is worth a thousand words, but takes 200K.

In reply to Re: better intersection of sets algorithm? by gam3
in thread better union of sets algorithm? by perrin

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.