Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Dear Serene Monks,

My problem is this: I want to test membership in a small set, many thousands of times. The set is a constant and I want my program to enforce that fact -- e.g. I don't want to use a global variable, unless I can guarantee that it is constant.

But, I don't have the ability to install and use Readonly::XS on my system.

My first try:

sub IsMember { my %Set = ( 'age' => 1, 'old' => 1, 'years' => 1); return $Set{$_[0]}; }

This seems to work okay, but profiling shows a significant amount of time in this subfunction (about 11 percent of my total run time.)

I also tried the same thing with a closure, just in case the %Set was being created on the stack each time:

{ my %Set = ( 'age' => 1, 'old' => 1, 'years' => 1); sub IsMember { return $Set{$_[0]}; } }
I also tried a regex, but still no significant speedup:

sub IsMember { return $_[0] =~ m/age|old|years/; }
I want to use a global constant hash and then access it directly (instead of through a function wrapper). But I cannot figure this out. I have perl 5.6.1 (and again, Readonly:XS is not an option at this time.) If no speedup can be expected - any suggestions on the best implementation of a constant hash (given my constraints) would be enormously helpful. - Student

Replies are listed 'Best First'.
Re: Fastest way to test membership in a constant set
by BrowserUk (Patriarch) on Jul 26, 2007 at 14:40 UTC

    Use a restricted hash and use exists directly, rather than via a subroutine. This will be far faster than any other method.

    #! perl -slw use strict; use Hash::Util qw[ lock_hash ]; my %set = map{ $_ => 1 } qw[ age old years ]; lock_hash %set; for my $unknown ( qw[ age sex old young years minutes] ) { print "$unknown ", exists $set{ $unknown } ? 'exists' : ' does not exist'; } $set{ fred } = 1; __END__ c:\test>junk3 age exists sex does not exist old exists young does not exist years exists minutes does not exist Attempt to access disallowed key 'fred' in a restricted hash at c:\tes +t\junk3.pl line 15.

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Fastest way to test membership in a constant set
by Corion (Patriarch) on Jul 26, 2007 at 14:31 UTC

    You don't show us how using a global hash fails for you, but here is an approach:

    use strict; use vars qw(%IsMember); %IsMember = ('age' => 1, 'old' => 1, 'years' => 1);

    Use it as:

    if ($IsMember{age}) { print "'age' is a member\n" }; if ($IsMember{ego}) { print "'ego' is a member\n" };

    For development, you can lock the hash keys by using Tie::Hash::FixedKeys.

      Why "use vars qw(%IsMember)" instead of my %IsMember or our %IsMember?

      -Paul

        Because usually, having a global lexical variable is just bad when you need to go around and change it and our basically is a misfeature in my opinion - it doesn't work on 5.005 should the code need to be used on that platform and it doesn't offer any advantage over use vars().

Re: Fastest way to test membership in a constant set (don't use a hash)
by grinder (Bishop) on Jul 26, 2007 at 22:24 UTC

    The set is small? And constant? Forget the hash, you have fallen into an XY trap.

    If they contain only alphabetical characters (or some subset) then it is a trivial matter to come up with a character that is unused by any of the set members.

    Build a string by joining the values using the delimiter character, and bracket the beginning and end of the string. You can then use index to determine membership. If the set is small enough, this will be faster than any hash or regexp solution.

    my $delim = ","; my $set = $delim . join($delim, keys %set) . $delim; sub IsMember { return index($set, "$delim$_[0]$delim") != -1; }

    • another intruder with the mooring in the heart of the Perl

      Not the way I tested it?

      #! perl -slw use strict; use Hash::Util qw[ lock_hash unlock_hash ]; use Benchmark qw[ cmpthese ]; our $ITERS ||= -1; our @items = qw[ age old years ]; our @tests = ( @items, map{ $_ . 'x' } @items ); our %set = map{ $_ => 1 } @items; lock_hash %set; our $set = $; . join( $;, @items ) . $;; cmpthese $ITERS, { hash => q[ my $count; exists $set{ $_ } and ++$count for @tests; print "hash: $count" if $ITERS == 1; ], index => q[ my $count; 1+index $set, $; . $_ . $; and ++$count for @tests; print "index: $count" if $ITERS == 1; ], }; @items = map{ $_ x 5 } 'a' .. 'z'; @tests = ( @items, map{ $_ . 'x' } @items ); unlock_hash %set; %set = map{ $_ => 1 } @items; lock_hash %set; $set = $; . join( $;, @items ) . $;; cmpthese $ITERS, { hash => q[ my $count; exists $set{ $_ } and ++$count for @tests; print "hash: $count" if $ITERS == 1; ], index => q[ my $count; 1+index $set, $; . $_ . $; and ++$count for @tests; print "index: $count" if $ITERS == 1; ], }; __END__ ## Check they produce teh same results: C:\test>junk3 -ITERS=1 hash: 3 (warning: too few iterations for a reliable count) index: 3 (warning: too few iterations for a reliable count) Rate hash index hash 1000000000000000/s -- 0% index 1000000000000000/s 0% -- hash: 26 (warning: too few iterations for a reliable count) index: 26 (warning: too few iterations for a reliable count) Rate hash index hash 1000000000000000/s -- 0% index 1000000000000000/s 0% -- ## See how fast C:\test>junk3 Rate index hash index 166820/s -- -31% hash 240174/s 44% -- Rate index hash index 18301/s -- -59% hash 44377/s 142% --

      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        I had to try your tests with bitmaps. Not as elegant as hashes but a bit faster.
        #! /usr/bin/perl -slw use strict; use Hash::Util qw[ lock_hash unlock_hash ]; use Benchmark qw[ cmpthese ]; our $ITERS ||= -1; our @items = qw[ age old years ]; our @tests = ( @items, map{ $_ . 'x' } @items ); our $a=0b00000000000000000000000001; our $b=0b00000000000000000000000010; our $c=0b00000000000000000000000100; our $d=0b00000000000000000000001000; our $e=0b00000000000000000000010000; our $f=0b00000000000000000000100000; our $g=0b00000000000000000001000000; our $h=0b00000000000000000010000000; our $i=0b00000000000000000100000000; our $j=0b00000000000000001000000000; our $k=0b00000000000000010000000000; our $l=0b00000000000000100000000000; our $m=0b00000000000001000000000000; our $n=0b00000000000010000000000000; our $o=0b00000000000100000000000000; our $p=0b00000000001000000000000000; our $q=0b00000000010000000000000000; our $r=0b00000000100000000000000000; our $s=0b00000001000000000000000000; our $t=0b00000010000000000000000000; our $u=0b00000100000000000000000000; our $v=0b00001000000000000000000000; our $w=0b00010000000000000000000000; our $x=0b00100000000000000000000000; our $y=0b01000000000000000000000000; our $z=0b10000000000000000000000000; our @myitems= ( $a, $b, $c, ); our @mytest= ( @myitems, $d, $e, $f); our $myset= $a | $b | $c; our %set = map{ $_ => 1 } @items; lock_hash %set; our $set = $; . join( $;, @items ) . $;; cmpthese $ITERS, { hash => q[ my $count; exists $set{ $_ } and ++$count for @tests; print "hash: $count" if $ITERS == 1; ], index => q[ my $count; 1+index $set, $; . $_ . $; and ++$count for @tests; print "index: $count" if $ITERS == 1; ], bitmap => q[ my $count; $myset & $_ and ++$count for @mytest; print "bitmap: $count" if $ITERS == 1; ], }; @items = map{ $_ x 5 } 'a' .. 'z'; @tests = ( @items, map{ $_ . 'x' } @items ); unlock_hash %set; %set = map{ $_ => 1 } @items; lock_hash %set; $set = $; . join( $;, @items ) . $;; @myitems= ( $a, $b, $c, $d, $e, $f, $g, $h, $i, $j, $k, $l, $m, $n, $o, $p, $q, $r, $s, $t, $u, $v, $w, $x, $y, $z, ); @mytest= ( @myitems, map { 0b0 } @myitems); $myset= 0b11111111111111111111111111; cmpthese $ITERS, { hash => q[ my $count; exists $set{ $_ } and ++$count for @tests; print "hash: $count" if $ITERS == 1; ], index => q[ my $count; 1+index $set, $; . $_ . $; and ++$count for @tests; print "index: $count" if $ITERS == 1; ], bitmap => q[ my $count; $myset & $_ and ++$count for @mytest; print "bitmap: $count" if $ITERS == 1; ], };
        Results on my machine:
        Rate index hash bitmap index 210823/s -- -51% -57% hash 432785/s 105% -- -13% bitmap 494700/s 135% 14% -- Rate index hash bitmap index 20276/s -- -73% -81% hash 75918/s 274% -- -28% bitmap 105217/s 419% 39% --
      my $set = $delim . join($delim, keys %set) . $delim;
      When I need to join and have the delimeter also at the beginning or ending or both, I use empty strings:
      my $set = join( $delim, "", keys %set, "" );
      Just another way to do it.
Re: Fastest way to test membership in a constant set
by runrig (Abbot) on Jul 26, 2007 at 16:53 UTC
    You don't need a readonly hash if you just use a closure...no code outside the block can change the hash:
    BEGIN { my %members; undef @members{qw(age old years)}; sub is_member { return exists $members{$_[0]}; } }
Re: Fastest way to test membership in a constant set
by FunkyMonk (Bishop) on Jul 26, 2007 at 14:27 UTC
Re: Fastest way to test membership in a constant set
by Anonymous Monk on Jul 26, 2007 at 15:24 UTC
    To answer your questions: 1. I could use non-XS Readonly for non-production (or Tie::Hash::FixedKeys as well). The desire for constness comes from wanting to write code that is self-documenting and abuse-preventing. In our environment, hacks and cut-n-paste from production code can (and does) happen. I want to minimize the hacks that will eventually occur but at the same time maximize speed. Sadly, I don't think my version of perl supports Hash::Util.
      Sadly, I don't think my version of perl supports Hash::Util

      You're using older than 5.8.0? Isn't that at least 5 years old?

      I'm not sure, and a quick scan of perldelta didn't tell me, when Internals first appeared, but it can do the same job with a minor additional amount of work. Once you have built your hash, you have to set the values readonly as well as the hash itself. Hardly onerous though:

      #! perl -slw use strict; use Internals qw[ SetReadOnly ]; my %set = map{ $_ => 1 } qw[ age old years ]; SetReadOnly( \$_ ) for values %set; ## Freeze values SetReadOnly \%set; ## and keys. for my $unknown ( qw[ age sex old young years minutes] ) { print "$unknown ", exists $set{ $unknown } ? 'exists' : ' does not exist'; } eval{ $set{ age } = 123 } or print $@; eval{ $set{ fred } = 1 } or print $@; __END__ C:\test>junk3 age exists sex does not exist old exists young does not exist years exists minutes does not exist Modification of a read-only value attempted at C:\test\junk3.pl line 1 +7. Attempt to access disallowed key 'fred' in a restricted hash at C:\tes +t\junk3.pl line 18.

      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Fastest way to test membership in a constant set
by Cabrion (Friar) on Jul 27, 2007 at 11:09 UTC
    Bear with me because this is going to seem silly at first. What about a completely hard coded approach? From the original post, the goal seems to be speed in addtion to being constant. Why bother using hashes at all? The following subroutine should be quite a bit faster than what you have, thought I admit I haven't compared the two.
    sub IsMemeber { if $_[0] eq 'age' return 1; if $_[0] eq 'old' return 1; if $_[0] eq 'years' return 1; return 0; }
    You could probably speed up the code even more by in-lining the if statements instead of enduring the overhead of a function call.

      This is fine if you have just a few keys. However, this approach is O(n) (i.e. execution time increases linearly with the number of items in the set), whereas a hash has approximately constant lookup time — or O(1).

      Even for only 10 items, a hash is considerably faster:

      #!/usr/bin/perl -w my $nkeys = 10; my @keys; our $key; # (keep last key for lookup below) for (1..$nkeys) { # generate random keys consisting of 3..10 chars $key = join "", map chr(97+int(rand(26))), (1..3+int(rand(8))); push @keys, $key; } # setup 'if'-code my $code = "sub isMember_if {\n" . join("", map "return 1 if \$_[0] eq '$_';\n", @keys) . "return 0;}"; #print "$code\n"; eval $code; # setup hash my %hash = map { $_ => 1 } @keys; sub isMember_hash { return $hash{$_[0]} }; use Benchmark 'cmpthese'; cmpthese( -1, { 'if' => 'isMember_if($key)', 'hash' => 'isMember_hash($key)' });

      On my machine, this prints (results vary slightly, depending on random keys being used)

      Rate if hash if 628278/s -- -68% hash 1946613/s 210% --

      Update: Actually, Perl's hashes are blazingly fast: even for just one key/comparison ($nkeys = 1), the if-approach still doesn't outperform the hash; and with key lengths of more than about 3 chars, the hash lookup is even noticeably faster (which seems to indicate that comparing two identical strings is slower than computing the hashed value of that string...).

        Perl never ceases to impress me. Also, shame on me for not doing my own benchmarks. I did know the if-approach would not scale well, but the OP seemed to be looking for a solution to a problem who's "search domain" was relatively small (3 elements in the example). But you have brought to light something I would never have expected.
Re: Fastest way to test membership in a constant set
by Anonymous Monk on Jul 26, 2007 at 14:25 UTC
    Okay - it seems like the closure method is the fastest at this time. But, can a use constant be used to define a constant set in Perl 5.6.1?
      constant, example 3.

      See the update here.

      -QM
      --
      Quantum Mechanics: The dreams stuff is made of