Re: Fast, Efficient Union and Intersection on arrays

Hm. I think your routine is broken? I get these results:

c:\test>unionIntersect -N=10 -SHOW
2 2 3 4 5 5 8 9 12 14
4 4 7 7 9 9 10 10 11 11

barvin: 5 : 11
   buk: 2 : 11
[download]

The union contains 11, and the intersection is only 2 (4,9) but your routine reports 5?

Anyway, this is my benchmark which shows a vec implementation is quite a bit faster:

#! perl -slw
use strict;
use Benchmark qw[ cmpthese ];

our $N ||= 100;
our $SHOW;

sub barvin {

        my ($in, $jn) = @_;

        my (%int, %uni);
        for my $i (@$in) {
                $uni{$i}++;
        }
        for my $j (@$jn) {
                $int{$j}++ if $uni{$j};
                $uni{$j}++;
        }
        print 'barvin: ', ((scalar keys %int), ' : ',  (scalar keys %u
+ni)) 
             if $SHOW;
        return ((scalar keys %int), (scalar keys %uni));
}

sub  buk{
    my( $aRef, $bRef ) = @_;
    my( $aBits, $bBits ) = ('') x 2;
    vec( $aBits, $_, 1 ) = 1 for @$aRef;
    vec( $bBits, $_, 1 ) = 1 for @$bRef;
    my $uCount = unpack '%32b*', ( $aBits | $bBits );
    my $iCount = unpack '%32b*', ( $aBits & $bBits );
    print 'buk: ', $iCount, ' : ', $uCount if $SHOW;
    return( $iCount, $uCount );
}

our @a = map int( rand $N * 1.5 ), 1 .. $N;
our @b = map int( rand $N * 1.5 ), 1 .. $N;

print "@{[ sort { $a <=> $b } @a ]}\n@{[ sort{ $a <=> $b } @b ]}\n" if
+ $SHOW;
cmpthese $SHOW ? 1 : -1, {
    barvin  => q[
        our( @a, @b );
        barvin( \@a, \@b );
    ],
    buk     => q[
        our( @a, @b );
        buk( \@a, \@b );
    ],
};

__END__
c:\test>unionIntersect -N=10
          Rate barvin    buk
barvin 33212/s     --   -29%
buk    47045/s    42%     --

c:\test>unionIntersect -N=100
         Rate barvin    buk
barvin 4111/s     --   -53%
buk    8777/s   113%     --

c:\test>unionIntersect -N=1000
        Rate barvin    buk
barvin 357/s     --   -64%
buk    998/s   179%     --
[download]

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

Comment on Re: Fast, Efficient Union and Intersection on arrays Select or Download Code

Replies are listed 'Best First'.
Re^2: Fast, Efficient Union and Intersection on arrays by barvin (Beadle) on Nov 20, 2008 at 18:34 UTC
Ah yes, I'm assuming unique elements in the two hashes, which is correct for my data. Thanks very much for your suggestions. These vectors are new to me, so some good stuff for me to chew on here.	[reply]