in reply to Re^3: What is the most efficient way to see if a hash is empty?
in thread What is the most efficient way to see if a hash is empty?

Thanks for the links to Perl source code and the dump.

Key counts being stored would be consistent with BrowserUk's observations that there seems to be little difference between scalar keys on an empty and full hash. But scalar(%h) is returning counts of used and allocated buckets, not key counts.

Given moritz's observations of a significant difference in empty and non-empty hashes for %h and the lack of an obvious dump field to store bucket counts in Devel::Peek output, I would think that %h is doing some sort of looping and is O(number of buckets).

Best, beth

Replies are listed 'Best First'.
Re^5: What is the most efficient way to see if a hash is empty?
by moritz (Cardinal) on Apr 28, 2009 at 11:06 UTC
    I don't understand how you got to that conclusion. My benchmark (when modified to use scalar %hash shows that there is no O(number of buckets) operation. All it does is factor of 4 or 5 difference between empty and non-empty hashes. For me that doesn't look like a loop, but rather like a good optimization for the empty case.

    (For the empty hash the format of scalar %hash differs from the "normal" case. Instead of "$count/$buckets" it's simply 0).

    The number of buckets seems to be stored in the MAX field.

      I got that because the more elements the bigger the difference in performance of empty and full hashes. If all we were doing was checking a member of a structure, why would there be any difference at all related to size? Wouldn't the difference between empty and full be simply the difference between one and two pointer de-references - one to get the allocated bucket count and one to get the used bucket count? I should think that would be at most 200%, not 400% and up. And that's assuming that the code involved in dying consumed only a fraction of the time consumed by fetching one or two variables.

      Best, beth

        See the diagrams for HVs at the bottom of PerlGuts Illustrated.

        • MAX is the number of buckets;
        • FILL is the number of those buckets in use;
        • KEYS the number of keys in the hash.
        • In the empty case ARRAY will be null (even though MAX will be minimum 7), hence the 0 returned for the empty case.

        I think that the differences in moritz' benchmark come about because die is an uncommonly slow opcode--essentially a long jump--which completely swamps the conditional test and so skews the timing beyond recognition.

        [0] Perl> for(2..6){ %a =(); %b = 1.."1e$_"; cmpthese -1, { A=>q[ for(1 .. 10000){ if( %a ){ 1; } } ], B=>q[ for(1 .. 10000){ unless( %b ){ 1; } } ] } };; Rate B A B 72.0/s -- -90% A 726/s 908% -- Rate B A B 72.6/s -- -90% A 736/s 914% -- Rate B A B 70.5/s -- -91% A 751/s 965% -- Rate B A B 69.5/s -- -91% A 759/s 992% -- Rate B A B 71.5/s -- -90% A 715/s 900% -- [0] Perl> for(2..6){ %a =(); %b = 1.."1e$_"; cmpthese -1, { A=>q[ for(1 .. 10000){ unless( %a ){ 1; } } ], B=>q[ for(1 .. 10000){ if( %b ){ 1; } } ] } };; Rate B A B 75.9/s -- -89% A 709/s 834% -- Rate B A B 73.7/s -- -90% A 730/s 890% -- Rate B A B 72.7/s -- -90% A 747/s 928% -- Rate B A B 71.5/s -- -90% A 725/s 914% -- Rate B A B 70.6/s -- -90% A 708/s 903% --

        The difference between empty and full hashes is remarkably consistant--if surprisingly high--regardless of the magnitude of the hash.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re^5: What is the most efficient way to see if a hash is empty?
by Anonymous Monk on Apr 28, 2009 at 10:13 UTC
    hv.h
    281 /* the number of keys (including any placeholers) */ 282 #define XHvTOTALKEYS(xhv) ((xhv)->xhv_keys) 289 #define HvKEYS(hv) HvUSEDKEYS(hv) 290 #define HvUSEDKEYS(hv) (HvTOTALKEYS(hv) - HvPLACEHOLDERS_ +get(hv)) 293 #define HvPLACEHOLDERS_get(hv) (SvMAGIC(hv) ? Perl_hv_placeholder +s_get(aTHX_ (const HV *)hv) : 0)
    hv.c
    2536 I32 2537 Perl_hv_placeholders_get(pTHX_ const HV *hv) 2538 { 2539 dVAR; 2540 MAGIC * const mg = mg_find((const SV *)hv, PERL_MAGIC_rhash); 2541 2542 PERL_ARGS_ASSERT_HV_PLACEHOLDERS_GET; 2543 2544 return mg ? mg->mg_len : 0; 2545 }
    mg.c
    411 MAGIC* 412 Perl_mg_find(pTHX_ const SV *sv, int type) 413 { 414 PERL_UNUSED_CONTEXT; 415 if (sv) { 416 MAGIC *mg; 417 for (mg = SvMAGIC(sv); mg; mg = mg->mg_moremagic) { 418 if (mg->mg_type == type) 419 return mg; 420 } 421 } 422 return NULL; 423 }