andreas1234567 has asked for the wisdom of the Perl Monks concerning the following question:

Monks,
I want to know if a hash is empty. Since I'll be doing it very often, I want is to perform as fast as possible. The defined section of perlfunc says:
if (%a_hash) { print "has hash members\n" }
To my surprise Benchmark, says that if(keys %hash) is roughy 5 times faster than if(%hash).
Less surprisingly, it also seems that if (scalar(%hash)) and if (%hash) performs equally fast, and that if (scalar(keys %hash)) and if(keys %hash) also performs equally fast.
#!/usr/bin/perl -sw use strict; use Benchmark; our $HASHSIZE ||= 1e3; our $ITERATIONS ||= 1e7; my %hash = (); # Init hash for (my $i=0; $i<$HASHSIZE; $i++) { $hash{$i} = 1; } timethese($ITERATIONS, { _keys => sub { if (keys %hash) { ; } }, _scalar_keys => sub { if (scalar(keys %hash)) { ; } }, _scalar => sub { if (scalar(%hash)) { ; } }, _if => sub { if (%hash) { ; } }, } ); __END__ =pod =head1 NAME bm_hash_emptiness.pl - benchmark hash emptiness tests. =head1 SYNOPSIS ./bm_hash_emptiness.pl -ITERATIONS=1e7 -HASHSIZE=1e4 =head1 RESULTS $ ./bm_hash_emptiness.pl -ITERATIONS=1e7 -HASHSIZE=1e4 Benchmark: timing 1e7 iterations of _if, _keys, _scalar, _scalar_key +s... _if: 14 wallclock secs (14.65 usr + 0.01 sys = 14.66 CPU) _keys: 2 wallclock secs ( 1.99 usr + 0.00 sys = 1.99 CPU) _scalar: 16 wallclock secs (15.03 usr + 0.01 sys = 15.04 CPU) _scalar_keys: 2 wallclock secs ( 2.40 usr + 0.00 sys = 2.40 CPU) =head SEE ALSO perlfunc =cut


Should I stick to the advice from perlfunc and use if (%hash) even when it seems to be slower than the alternative?

--
Andreas

Replies are listed 'Best First'.
Re: Benchmark of hash emptiness test
by GrandFather (Saint) on Nov 09, 2006 at 10:43 UTC

    I can't answer your question, however it is interesting to consider the two cases:

    use strict; use warnings; use Benchmark qw(cmpthese); my $HASHSIZE = 1e3; my %hash; test (); @hash{1..$HASHSIZE} = () x $HASHSIZE; test (); sub test { my $_keys = keys %hash; my $_scalar_keys = scalar(keys %hash); my $_scalar = scalar(%hash); my $_if = %hash; print "_keys: $_keys\n"; print "_scalar_keys: $_scalar_keys\n"; print "_scalar: $_scalar\n"; print "_if: $_if\n"; cmpthese(-1, { _keys => sub {if (keys %hash) { ; }}, _scalar_keys => sub {if (scalar(keys %hash)) { ; }}, _scalar => sub {if (scalar(%hash)) { ; }}, _if => sub {if (%hash) { ; }}, } ); }

    Prints:

    _keys: 0 _scalar_keys: 0 _scalar: 0 _if: 0 Rate _scalar _if _keys _scalar_ +keys _scalar 5217686/s -- -10% -15% +-18% _if 5766838/s 11% -- -6% +-10% _keys 6136894/s 18% 6% -- + -4% _scalar_keys 6387143/s 22% 11% 4% + -- _keys: 1000 _scalar_keys: 1000 _scalar: 629/1024 _if: 629/1024 Rate _scalar _if _keys _scalar_ +keys _scalar 379905/s -- -1% -92% +-92% _if 384350/s 1% -- -92% +-92% _keys 4930608/s 1198% 1183% -- + -1% _scalar_keys 4996935/s 1215% 1200% 1% + --

    Note that I prefer to use cmpthese and that unless really huge numbers of tests are made without anything much else going on, the time is sufficiently small as to not matter anyway.


    DWIM is Perl's answer to Gödel
Re: Benchmark of hash emptiness test
by dk (Chaplain) on Nov 09, 2006 at 22:49 UTC
    perldoc perldata says:

    If you evaluate a hash in scalar context, it returns false if the hash is empty. If there are any key/value pairs, it returns true; more precisely, the value returned is a string consisting of the number of used buckets and the number of allocated buckets, separated by a slash. This is pretty much useful only to find out whether Perl's internal hashing algorithm is performing poorly on your data set. For example, you stick 10,000 things in a hash, but evaluating %HASH in scalar context reveals "1/16", which means only one out of sixteen buckets has been touched, and presumably contains all 10,000 of your items. This isn't supposed to happen.

      So presumably the logic required to determine these stats is what is taking the time (to make it take longer than the keys function).

      Is there a case for saying that a hash in a scalar context should do the simplest (or rather, quickest) operation possible to determine if it is empty or not? There could always be a builtin to return these stats if you really wanted them.

      This is, after all, a very common operaion quite often used within a tight loop.

        The reason for the speed difference is because if (%hash) {} ends being implemented internally as

        sv = sv_newmortal(); if (HvFILL((HV*)hv)) Perl_sv_setpvf(aTHX_ sv, "%ld/%ld", (long)HvFILL(hv), (long)HvMAX(hv) + 1); else sv_setiv(sv, 0);

        So what happens is a new SV is created, its then populated with a string using something like sprintf.

        This can be compared against the keys %hash option where an extra opcode is executed, BUT that opcode involves creating an SvIV only and therefore requires no memory allocation, no conversion of longs to strings, etc.

        So the bottom line is that if you are concerned about speed use the keys form. In Perl 5.10 we will try to make this an internal optimisation, (internally using if (keys %foo) when the user typed if (%foo) )but its not exactly priority, at least not for me :-).

        Update: Well, I gave it a try just to see what was involved, and before I knew it I was sending off patches. So theres a half decent chance this will be fixed in perl 5.10

        ---
        $world=~s/war/peace/g

Re: Benchmark of hash emptiness test
by graff (Chancellor) on Nov 09, 2006 at 17:25 UTC
    Should I stick to the advice from perlfunc and use if (%hash) even when it seems to be slower than the alternative?

    I wouldn't view that example from the perlfunc man page as "advice", but simply as an explanatory example. It's just telling you what it means for a %hash variable to be "defined" vs. "not defined"; it's there to clarify the relationship between "having keys" and "being defined", as this applies to hashes. (update: I should have looked more closely first -- yes, the phrasing does make it look like it's intended to be "advice", but if you can find a better way, that's great! Thanks for telling us about it.)

    Use whatever idiom works best for your app. That's what benchmarking is all about.