in reply to Fastest way to test membership in a constant set

The set is small? And constant? Forget the hash, you have fallen into an XY trap.

If they contain only alphabetical characters (or some subset) then it is a trivial matter to come up with a character that is unused by any of the set members.

Build a string by joining the values using the delimiter character, and bracket the beginning and end of the string. You can then use index to determine membership. If the set is small enough, this will be faster than any hash or regexp solution.

my $delim = ","; my $set = $delim . join($delim, keys %set) . $delim; sub IsMember { return index($set, "$delim$_[0]$delim") != -1; }

• another intruder with the mooring in the heart of the Perl

  • Comment on Re: Fastest way to test membership in a constant set (don't use a hash)
  • Download Code

Replies are listed 'Best First'.
Re^2: Fastest way to test membership in a constant set (Why not?)
by BrowserUk (Patriarch) on Jul 27, 2007 at 00:03 UTC

    Not the way I tested it?

    #! perl -slw use strict; use Hash::Util qw[ lock_hash unlock_hash ]; use Benchmark qw[ cmpthese ]; our $ITERS ||= -1; our @items = qw[ age old years ]; our @tests = ( @items, map{ $_ . 'x' } @items ); our %set = map{ $_ => 1 } @items; lock_hash %set; our $set = $; . join( $;, @items ) . $;; cmpthese $ITERS, { hash => q[ my $count; exists $set{ $_ } and ++$count for @tests; print "hash: $count" if $ITERS == 1; ], index => q[ my $count; 1+index $set, $; . $_ . $; and ++$count for @tests; print "index: $count" if $ITERS == 1; ], }; @items = map{ $_ x 5 } 'a' .. 'z'; @tests = ( @items, map{ $_ . 'x' } @items ); unlock_hash %set; %set = map{ $_ => 1 } @items; lock_hash %set; $set = $; . join( $;, @items ) . $;; cmpthese $ITERS, { hash => q[ my $count; exists $set{ $_ } and ++$count for @tests; print "hash: $count" if $ITERS == 1; ], index => q[ my $count; 1+index $set, $; . $_ . $; and ++$count for @tests; print "index: $count" if $ITERS == 1; ], }; __END__ ## Check they produce teh same results: C:\test>junk3 -ITERS=1 hash: 3 (warning: too few iterations for a reliable count) index: 3 (warning: too few iterations for a reliable count) Rate hash index hash 1000000000000000/s -- 0% index 1000000000000000/s 0% -- hash: 26 (warning: too few iterations for a reliable count) index: 26 (warning: too few iterations for a reliable count) Rate hash index hash 1000000000000000/s -- 0% index 1000000000000000/s 0% -- ## See how fast C:\test>junk3 Rate index hash index 166820/s -- -31% hash 240174/s 44% -- Rate index hash index 18301/s -- -59% hash 44377/s 142% --

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      I had to try your tests with bitmaps. Not as elegant as hashes but a bit faster.
      #! /usr/bin/perl -slw use strict; use Hash::Util qw[ lock_hash unlock_hash ]; use Benchmark qw[ cmpthese ]; our $ITERS ||= -1; our @items = qw[ age old years ]; our @tests = ( @items, map{ $_ . 'x' } @items ); our $a=0b00000000000000000000000001; our $b=0b00000000000000000000000010; our $c=0b00000000000000000000000100; our $d=0b00000000000000000000001000; our $e=0b00000000000000000000010000; our $f=0b00000000000000000000100000; our $g=0b00000000000000000001000000; our $h=0b00000000000000000010000000; our $i=0b00000000000000000100000000; our $j=0b00000000000000001000000000; our $k=0b00000000000000010000000000; our $l=0b00000000000000100000000000; our $m=0b00000000000001000000000000; our $n=0b00000000000010000000000000; our $o=0b00000000000100000000000000; our $p=0b00000000001000000000000000; our $q=0b00000000010000000000000000; our $r=0b00000000100000000000000000; our $s=0b00000001000000000000000000; our $t=0b00000010000000000000000000; our $u=0b00000100000000000000000000; our $v=0b00001000000000000000000000; our $w=0b00010000000000000000000000; our $x=0b00100000000000000000000000; our $y=0b01000000000000000000000000; our $z=0b10000000000000000000000000; our @myitems= ( $a, $b, $c, ); our @mytest= ( @myitems, $d, $e, $f); our $myset= $a | $b | $c; our %set = map{ $_ => 1 } @items; lock_hash %set; our $set = $; . join( $;, @items ) . $;; cmpthese $ITERS, { hash => q[ my $count; exists $set{ $_ } and ++$count for @tests; print "hash: $count" if $ITERS == 1; ], index => q[ my $count; 1+index $set, $; . $_ . $; and ++$count for @tests; print "index: $count" if $ITERS == 1; ], bitmap => q[ my $count; $myset & $_ and ++$count for @mytest; print "bitmap: $count" if $ITERS == 1; ], }; @items = map{ $_ x 5 } 'a' .. 'z'; @tests = ( @items, map{ $_ . 'x' } @items ); unlock_hash %set; %set = map{ $_ => 1 } @items; lock_hash %set; $set = $; . join( $;, @items ) . $;; @myitems= ( $a, $b, $c, $d, $e, $f, $g, $h, $i, $j, $k, $l, $m, $n, $o, $p, $q, $r, $s, $t, $u, $v, $w, $x, $y, $z, ); @mytest= ( @myitems, map { 0b0 } @myitems); $myset= 0b11111111111111111111111111; cmpthese $ITERS, { hash => q[ my $count; exists $set{ $_ } and ++$count for @tests; print "hash: $count" if $ITERS == 1; ], index => q[ my $count; 1+index $set, $; . $_ . $; and ++$count for @tests; print "index: $count" if $ITERS == 1; ], bitmap => q[ my $count; $myset & $_ and ++$count for @mytest; print "bitmap: $count" if $ITERS == 1; ], };
      Results on my machine:
      Rate index hash bitmap index 210823/s -- -51% -57% hash 432785/s 105% -- -13% bitmap 494700/s 135% 14% -- Rate index hash bitmap index 20276/s -- -73% -81% hash 75918/s 274% -- -28% bitmap 105217/s 419% 39% --

        Yeah! That cool 'n all, but you're hardly testing like with like. Show me where you code is looking up the words 'age', 'old' and 'years'?

        In order to make bitsets practical you have to have a mechanism for mapping the real values (words in this case), with their representative bit patterns. The obvious, and fastest way to do that mapping is to use a hash. And by the time you've done that, you might as well discard the bitset because that hash is doing all the real work, and doing it twice will just be slower.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re^2: Fastest way to test membership in a constant set (don't use a hash)
by ysth (Canon) on Jul 31, 2007 at 19:12 UTC
    my $set = $delim . join($delim, keys %set) . $delim;
    When I need to join and have the delimeter also at the beginning or ending or both, I use empty strings:
    my $set = join( $delim, "", keys %set, "" );
    Just another way to do it.