If your values are integers in a sane range (ie. reasonably low) you can save a little time, upto around 60% or so, by using a bit vector instead of a hash:
#! perl -slw use strict; use Benchmark qw[ cmpthese ]; our $B ||= 32; our $N ||= 1000; our $S ||= 100; our $R ||= $N * $S; our @sets = map{ [ map int rand $R, 1 .. $N ] } 1 .. $S; our( @hUniq, @vUniq ); cmpthese -2, { hash => q[ my %seen; undef @hUniq; @hUniq = grep{ $seen{ $_ }++ } @$_ for @sets; ], vec => q[ my $vector = ''; undef @vUniq; @vUniq = grep{ vec( $vector, $_, $B )++ } @$_ for @sets; ], }; print 'H:'. @hUniq, ' V: '. @vUniq; __END__ P:\test>438536 Rate hash vec hash 2.69/s -- -23% vec 3.47/s 29% -- H:954 V: 954 P:\test>438536 -S=20 Rate hash vec hash 16.4/s -- -41% vec 27.5/s 68% -- H:639 V: 639 P:\test>438536 -S=20 -N=100 Rate hash vec hash 264/s -- -21% vec 336/s 27% -- H:61 V: 61 P:\test>438536 -S=20 -N=2000 Rate hash vec hash 7.33/s -- -36% vec 11.5/s 56% -- H:1407 V: 1407 P:\test>438536 -S=20 -N=2000 -B=16 Rate hash vec hash 7.06/s -- -40% vec 11.8/s 67% -- H:1399 V: 1399
If you could build and manipulate your sets as bit vectors, by storing your uniq values in an hash as you get them and setting bits in the vectors to represent them, ORing the bit vectors will get your union. You then use the set-bit positions as indexes into your array of unique values to reconstitute the union set.
It works for any type of values and is very fast provided you can build and work with your sets that way.
In reply to Re: better union of sets algorithm?
by BrowserUk
in thread better union of sets algorithm?
by perrin
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |