in reply to Benchmark of hash emptiness test

perldoc perldata says:

If you evaluate a hash in scalar context, it returns false if the hash is empty. If there are any key/value pairs, it returns true; more precisely, the value returned is a string consisting of the number of used buckets and the number of allocated buckets, separated by a slash. This is pretty much useful only to find out whether Perl's internal hashing algorithm is performing poorly on your data set. For example, you stick 10,000 things in a hash, but evaluating %HASH in scalar context reveals "1/16", which means only one out of sixteen buckets has been touched, and presumably contains all 10,000 of your items. This isn't supposed to happen.

Replies are listed 'Best First'.
Re^2: Benchmark of hash emptiness test
by aufflick (Deacon) on Nov 10, 2006 at 00:46 UTC
    So presumably the logic required to determine these stats is what is taking the time (to make it take longer than the keys function).

    Is there a case for saying that a hash in a scalar context should do the simplest (or rather, quickest) operation possible to determine if it is empty or not? There could always be a builtin to return these stats if you really wanted them.

    This is, after all, a very common operaion quite often used within a tight loop.

      The reason for the speed difference is because if (%hash) {} ends being implemented internally as

      sv = sv_newmortal(); if (HvFILL((HV*)hv)) Perl_sv_setpvf(aTHX_ sv, "%ld/%ld", (long)HvFILL(hv), (long)HvMAX(hv) + 1); else sv_setiv(sv, 0);

      So what happens is a new SV is created, its then populated with a string using something like sprintf.

      This can be compared against the keys %hash option where an extra opcode is executed, BUT that opcode involves creating an SvIV only and therefore requires no memory allocation, no conversion of longs to strings, etc.

      So the bottom line is that if you are concerned about speed use the keys form. In Perl 5.10 we will try to make this an internal optimisation, (internally using if (keys %foo) when the user typed if (%foo) )but its not exactly priority, at least not for me :-).

      Update: Well, I gave it a try just to see what was involved, and before I knew it I was sending off patches. So theres a half decent chance this will be fixed in perl 5.10

      ---
      $world=~s/war/peace/g

        Pardon my ignorance (I don't know much about the perl internals) but I was under the impression that there was something like boolean context. So shouldn't the hash in if (%hash) {} be able to tell that it was used in boolean context and thus react accordingly? This should then be an easy patch, just defining a different behaviour for %hash in boolean (in contrast to scalar) context.

        -- Hofmator

        Code written by Hofmator and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.

        sweet! I really should get familiar with the perl source. I've read through perlgust illustrated a few times and cracked out a little XS code, but maybe it's time to start getting interested in the perl core. Being able to roll an optimisation into the perl codebase - now thats cool!