in reply to Fast, Efficient Union and Intersection on arrays

Hm. I think your routine is broken? I get these results:

c:\test>unionIntersect -N=10 -SHOW 2 2 3 4 5 5 8 9 12 14 4 4 7 7 9 9 10 10 11 11 barvin: 5 : 11 buk: 2 : 11

The union contains 11, and the intersection is only 2 (4,9) but your routine reports 5?

Anyway, this is my benchmark which shows a vec implementation is quite a bit faster:

#! perl -slw use strict; use Benchmark qw[ cmpthese ]; our $N ||= 100; our $SHOW; sub barvin { my ($in, $jn) = @_; my (%int, %uni); for my $i (@$in) { $uni{$i}++; } for my $j (@$jn) { $int{$j}++ if $uni{$j}; $uni{$j}++; } print 'barvin: ', ((scalar keys %int), ' : ', (scalar keys %u +ni)) if $SHOW; return ((scalar keys %int), (scalar keys %uni)); } sub buk{ my( $aRef, $bRef ) = @_; my( $aBits, $bBits ) = ('') x 2; vec( $aBits, $_, 1 ) = 1 for @$aRef; vec( $bBits, $_, 1 ) = 1 for @$bRef; my $uCount = unpack '%32b*', ( $aBits | $bBits ); my $iCount = unpack '%32b*', ( $aBits & $bBits ); print 'buk: ', $iCount, ' : ', $uCount if $SHOW; return( $iCount, $uCount ); } our @a = map int( rand $N * 1.5 ), 1 .. $N; our @b = map int( rand $N * 1.5 ), 1 .. $N; print "@{[ sort { $a <=> $b } @a ]}\n@{[ sort{ $a <=> $b } @b ]}\n" if + $SHOW; cmpthese $SHOW ? 1 : -1, { barvin => q[ our( @a, @b ); barvin( \@a, \@b ); ], buk => q[ our( @a, @b ); buk( \@a, \@b ); ], }; __END__ c:\test>unionIntersect -N=10 Rate barvin buk barvin 33212/s -- -29% buk 47045/s 42% -- c:\test>unionIntersect -N=100 Rate barvin buk barvin 4111/s -- -53% buk 8777/s 113% -- c:\test>unionIntersect -N=1000 Rate barvin buk barvin 357/s -- -64% buk 998/s 179% --

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

Replies are listed 'Best First'.
Re^2: Fast, Efficient Union and Intersection on arrays
by barvin (Beadle) on Nov 20, 2008 at 18:34 UTC
    Ah yes, I'm assuming unique elements in the two hashes, which is correct for my data. Thanks very much for your suggestions. These vectors are new to me, so some good stuff for me to chew on here.