in reply to Re: Fastest way to test membership in a constant set (don't use a hash)
in thread Fastest way to test membership in a constant set

Not the way I tested it?

#! perl -slw use strict; use Hash::Util qw[ lock_hash unlock_hash ]; use Benchmark qw[ cmpthese ]; our $ITERS ||= -1; our @items = qw[ age old years ]; our @tests = ( @items, map{ $_ . 'x' } @items ); our %set = map{ $_ => 1 } @items; lock_hash %set; our $set = $; . join( $;, @items ) . $;; cmpthese $ITERS, { hash => q[ my $count; exists $set{ $_ } and ++$count for @tests; print "hash: $count" if $ITERS == 1; ], index => q[ my $count; 1+index $set, $; . $_ . $; and ++$count for @tests; print "index: $count" if $ITERS == 1; ], }; @items = map{ $_ x 5 } 'a' .. 'z'; @tests = ( @items, map{ $_ . 'x' } @items ); unlock_hash %set; %set = map{ $_ => 1 } @items; lock_hash %set; $set = $; . join( $;, @items ) . $;; cmpthese $ITERS, { hash => q[ my $count; exists $set{ $_ } and ++$count for @tests; print "hash: $count" if $ITERS == 1; ], index => q[ my $count; 1+index $set, $; . $_ . $; and ++$count for @tests; print "index: $count" if $ITERS == 1; ], }; __END__ ## Check they produce teh same results: C:\test>junk3 -ITERS=1 hash: 3 (warning: too few iterations for a reliable count) index: 3 (warning: too few iterations for a reliable count) Rate hash index hash 1000000000000000/s -- 0% index 1000000000000000/s 0% -- hash: 26 (warning: too few iterations for a reliable count) index: 26 (warning: too few iterations for a reliable count) Rate hash index hash 1000000000000000/s -- 0% index 1000000000000000/s 0% -- ## See how fast C:\test>junk3 Rate index hash index 166820/s -- -31% hash 240174/s 44% -- Rate index hash index 18301/s -- -59% hash 44377/s 142% --

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

Replies are listed 'Best First'.
Re^3: Fastest way to test membership in a constant set (Why not?)
by hipe (Sexton) on Jul 31, 2007 at 12:09 UTC
    I had to try your tests with bitmaps. Not as elegant as hashes but a bit faster.
    #! /usr/bin/perl -slw use strict; use Hash::Util qw[ lock_hash unlock_hash ]; use Benchmark qw[ cmpthese ]; our $ITERS ||= -1; our @items = qw[ age old years ]; our @tests = ( @items, map{ $_ . 'x' } @items ); our $a=0b00000000000000000000000001; our $b=0b00000000000000000000000010; our $c=0b00000000000000000000000100; our $d=0b00000000000000000000001000; our $e=0b00000000000000000000010000; our $f=0b00000000000000000000100000; our $g=0b00000000000000000001000000; our $h=0b00000000000000000010000000; our $i=0b00000000000000000100000000; our $j=0b00000000000000001000000000; our $k=0b00000000000000010000000000; our $l=0b00000000000000100000000000; our $m=0b00000000000001000000000000; our $n=0b00000000000010000000000000; our $o=0b00000000000100000000000000; our $p=0b00000000001000000000000000; our $q=0b00000000010000000000000000; our $r=0b00000000100000000000000000; our $s=0b00000001000000000000000000; our $t=0b00000010000000000000000000; our $u=0b00000100000000000000000000; our $v=0b00001000000000000000000000; our $w=0b00010000000000000000000000; our $x=0b00100000000000000000000000; our $y=0b01000000000000000000000000; our $z=0b10000000000000000000000000; our @myitems= ( $a, $b, $c, ); our @mytest= ( @myitems, $d, $e, $f); our $myset= $a | $b | $c; our %set = map{ $_ => 1 } @items; lock_hash %set; our $set = $; . join( $;, @items ) . $;; cmpthese $ITERS, { hash => q[ my $count; exists $set{ $_ } and ++$count for @tests; print "hash: $count" if $ITERS == 1; ], index => q[ my $count; 1+index $set, $; . $_ . $; and ++$count for @tests; print "index: $count" if $ITERS == 1; ], bitmap => q[ my $count; $myset & $_ and ++$count for @mytest; print "bitmap: $count" if $ITERS == 1; ], }; @items = map{ $_ x 5 } 'a' .. 'z'; @tests = ( @items, map{ $_ . 'x' } @items ); unlock_hash %set; %set = map{ $_ => 1 } @items; lock_hash %set; $set = $; . join( $;, @items ) . $;; @myitems= ( $a, $b, $c, $d, $e, $f, $g, $h, $i, $j, $k, $l, $m, $n, $o, $p, $q, $r, $s, $t, $u, $v, $w, $x, $y, $z, ); @mytest= ( @myitems, map { 0b0 } @myitems); $myset= 0b11111111111111111111111111; cmpthese $ITERS, { hash => q[ my $count; exists $set{ $_ } and ++$count for @tests; print "hash: $count" if $ITERS == 1; ], index => q[ my $count; 1+index $set, $; . $_ . $; and ++$count for @tests; print "index: $count" if $ITERS == 1; ], bitmap => q[ my $count; $myset & $_ and ++$count for @mytest; print "bitmap: $count" if $ITERS == 1; ], };
    Results on my machine:
    Rate index hash bitmap index 210823/s -- -51% -57% hash 432785/s 105% -- -13% bitmap 494700/s 135% 14% -- Rate index hash bitmap index 20276/s -- -73% -81% hash 75918/s 274% -- -28% bitmap 105217/s 419% 39% --

      Yeah! That cool 'n all, but you're hardly testing like with like. Show me where you code is looking up the words 'age', 'old' and 'years'?

      In order to make bitsets practical you have to have a mechanism for mapping the real values (words in this case), with their representative bit patterns. The obvious, and fastest way to do that mapping is to use a hash. And by the time you've done that, you might as well discard the bitset because that hash is doing all the real work, and doing it twice will just be slower.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        Bitsets are often used for representation of sets. That is the reason I was curious to see how they compare with hashes.

        I agree with you that for this and probably most of the cases, hash is a better representation for a set. In case of a lottery, for example, bitsets might be better.