in reply to Re^2: Fastest way to test membership in a constant set (Why not?)
in thread Fastest way to test membership in a constant set

I had to try your tests with bitmaps. Not as elegant as hashes but a bit faster.
#! /usr/bin/perl -slw use strict; use Hash::Util qw[ lock_hash unlock_hash ]; use Benchmark qw[ cmpthese ]; our $ITERS ||= -1; our @items = qw[ age old years ]; our @tests = ( @items, map{ $_ . 'x' } @items ); our $a=0b00000000000000000000000001; our $b=0b00000000000000000000000010; our $c=0b00000000000000000000000100; our $d=0b00000000000000000000001000; our $e=0b00000000000000000000010000; our $f=0b00000000000000000000100000; our $g=0b00000000000000000001000000; our $h=0b00000000000000000010000000; our $i=0b00000000000000000100000000; our $j=0b00000000000000001000000000; our $k=0b00000000000000010000000000; our $l=0b00000000000000100000000000; our $m=0b00000000000001000000000000; our $n=0b00000000000010000000000000; our $o=0b00000000000100000000000000; our $p=0b00000000001000000000000000; our $q=0b00000000010000000000000000; our $r=0b00000000100000000000000000; our $s=0b00000001000000000000000000; our $t=0b00000010000000000000000000; our $u=0b00000100000000000000000000; our $v=0b00001000000000000000000000; our $w=0b00010000000000000000000000; our $x=0b00100000000000000000000000; our $y=0b01000000000000000000000000; our $z=0b10000000000000000000000000; our @myitems= ( $a, $b, $c, ); our @mytest= ( @myitems, $d, $e, $f); our $myset= $a | $b | $c; our %set = map{ $_ => 1 } @items; lock_hash %set; our $set = $; . join( $;, @items ) . $;; cmpthese $ITERS, { hash => q[ my $count; exists $set{ $_ } and ++$count for @tests; print "hash: $count" if $ITERS == 1; ], index => q[ my $count; 1+index $set, $; . $_ . $; and ++$count for @tests; print "index: $count" if $ITERS == 1; ], bitmap => q[ my $count; $myset & $_ and ++$count for @mytest; print "bitmap: $count" if $ITERS == 1; ], }; @items = map{ $_ x 5 } 'a' .. 'z'; @tests = ( @items, map{ $_ . 'x' } @items ); unlock_hash %set; %set = map{ $_ => 1 } @items; lock_hash %set; $set = $; . join( $;, @items ) . $;; @myitems= ( $a, $b, $c, $d, $e, $f, $g, $h, $i, $j, $k, $l, $m, $n, $o, $p, $q, $r, $s, $t, $u, $v, $w, $x, $y, $z, ); @mytest= ( @myitems, map { 0b0 } @myitems); $myset= 0b11111111111111111111111111; cmpthese $ITERS, { hash => q[ my $count; exists $set{ $_ } and ++$count for @tests; print "hash: $count" if $ITERS == 1; ], index => q[ my $count; 1+index $set, $; . $_ . $; and ++$count for @tests; print "index: $count" if $ITERS == 1; ], bitmap => q[ my $count; $myset & $_ and ++$count for @mytest; print "bitmap: $count" if $ITERS == 1; ], };
Results on my machine:
Rate index hash bitmap index 210823/s -- -51% -57% hash 432785/s 105% -- -13% bitmap 494700/s 135% 14% -- Rate index hash bitmap index 20276/s -- -73% -81% hash 75918/s 274% -- -28% bitmap 105217/s 419% 39% --

Replies are listed 'Best First'.
Re^4: Fastest way to test membership in a constant set (Why not?)
by BrowserUk (Patriarch) on Jul 31, 2007 at 14:33 UTC

    Yeah! That cool 'n all, but you're hardly testing like with like. Show me where you code is looking up the words 'age', 'old' and 'years'?

    In order to make bitsets practical you have to have a mechanism for mapping the real values (words in this case), with their representative bit patterns. The obvious, and fastest way to do that mapping is to use a hash. And by the time you've done that, you might as well discard the bitset because that hash is doing all the real work, and doing it twice will just be slower.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Bitsets are often used for representation of sets. That is the reason I was curious to see how they compare with hashes.

      I agree with you that for this and probably most of the cases, hash is a better representation for a set. In case of a lottery, for example, bitsets might be better.