comment on

This Code is for intersection for union code see Re^3: better union of sets algorithm?

It seems that you can get a lot of improvement. Here are two tests I ran. The first is using two sets each with about half of /usr/dict/words and the third having the word zoo in it.

Union wineglasses zoo
Union wineglasses zoo
            (warning: too few iterations for a reliable count)
         s/iter   sorted unsorted
sorted     4.34       --     -78%
unsorted  0.956     354%       --
Intersection wineglasses zoo
Intersection wineglasses zoo

The data for this run is

@a1 = qw (a b c d g);
@a2 = qw (a b c e d);
@a3 = qw (a b c f g);

Intersection a b c
Intersection a b c
           Rate unsorted   sorted
unsorted  205/s       --     -93%
sorted   2775/s    1251%       --
Intersection a b c
Intersection a b c

So even with a small data set the sorted method is faster. For both methods the sets must have unique elements (as all set do).

sub intersection_1
{
  my $count = shift;
  $count--;
  my %saw;

  grep($count == $saw{$_}++, @_);
}

# This code is based on the SortedMultiList code but works
# for an intersection of more than two sets.
sub intersection_2
{
  my $self = [@_];
  my ($lowest_arr, $lowest, $next);
  my @ret = ();
  my $elements = scalar @_;

  while (1) {
    my $count = 0;
    $lowest = $next;
    if (!defined($lowest)) {
      # find min element in queue
      foreach my $a (@$self)
      {
        return @ret unless (@$a);
        if (!defined($lowest) or ($a->[0] lt $lowest))
        {
          $lowest = $a->[0];
        }
      }
    }
    foreach my $a (@$self)
    {
      return @ret unless (@$a);
      if ($a->[0] eq $lowest)
      {
         shift @$a;
         $count++;
      } else {
        if (!defined($next) or ($a->[0] lt $next))
        {
          $next = $a->[0];
        }
      }
    }
    if ($count == $elements) {
        push(@ret, $lowest);
    }
    $lowest = $next;
    undef $next;
  }
}
[download]

Here is the Benchmark code. How the argments are passed may have some effect on the speed. intersection_2 modifies the arrays so I do the copies to make the test more fair.

cmpthese -10, {
    unsorted => q[
        my @x1 = @a1;
        my @x2 = @a2;
        my @x3 = @a3;
        my @out = intersection_1(3, @x1, @x2, @x3);
    ],
    sorted => q[
        my @x1 = @a1;
        my @x2 = @a2;
        my @x3 = @a3;
        my @out = intersection_2(\@x1, \@x2, \@x3);
    ]
};

The SortedMultiList code is at Re: better union of sets algorithm?.

A picture is worth a thousand words, but takes 200K.

In reply to Re: better intersection of sets algorithm? by gam3
in thread better union of sets algorithm? by perrin

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.