in reply to What is the most efficient way to see if a hash is empty?

Hm. scalar keys %hash does not seem to be O(N):

[0] Perl> %a = 1..1e6;; [0] Perl> %b = ();; [0] Perl> cmpthese -1, { A=>q[ scalar keys %a and ++$i for 1 .. 1000 ], B=>q[ scalar keys %b and ++$j for 1 .. 1000 ] };; Rate A B A 8354/s -- -15% B 9845/s 18% -- [0] Perl> %b = (1..2e6);; [0] Perl> cmpthese -1, { A=>q[ scalar keys %a and ++$i for 1 .. 1000 ], B=>q[ scalar keys %b and ++$j for 1 .. 1000 ] };; Rate A B A 821/s -- -0% B 824/s 0% -- [0] Perl> %a = ();; [0] Perl> cmpthese -1, { A=>q[ scalar keys %a and ++$i for 1 .. 1000 ], B=>q[ scalar keys %b and ++$j for 1 .. 1000 ] };; Rate B A B 815/s -- -18% A 998/s 23% -- [0] Perl> %a = (1,2);; [0] Perl> cmpthese -1, { A=>q[ scalar keys %a and ++$i for 1 .. 1000 ], B=>q[ scalar keys %b and ++$j for 1 .. 1000 ] };; Rate A B A 843/s -- -1% B 849/s 1% -- [0] Perl> cmpthese -1, { A=>q[ scalar keys %a and ++$i for 1 .. 1000 ], B=>q[ scalar keys %b and ++$j for 1 .. 1000 ] };; Rate B A B 804/s -- -3% A 831/s 3% --

I'll let you draw your own conclusions, but mine are that if the hash is completely empty it is slightly faster than if it has any keys. But whether it has 1 key or 1 million make no difference at all.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

Replies are listed 'Best First'.
Re^2: What is the most efficient way to see if a hash is empty?
by shmem (Chancellor) on Apr 28, 2009 at 09:23 UTC
    I'll let you draw your own conclusions

    I would conclude that your benchmark may be flawed since for empty hashes the $j++ operation is skipped. Maybe that's the reason for the difference?

      I put the and ++$i there to stop the scalar keys %hash being optimised away for being in a void context:

      [0] Perl> %a = 1..2e6; %b = 1..2e6;; [0] Perl> cmpthese -1, { A=>q[ scalar keys %a and ++$i for 1 .. 10000 ], B=>q[ scalar keys %b for 1 .. 10000 ] };; Rate A B A 804/s -- -26% B 1087/s 35% --

      A problem I've been bitten by with other useless statements in void contexts.

      And because it is far cheaper than an assignment:

      [0] Perl> cmpthese -1, { A=>q[ scalar keys %a and ++$i for 1 .. 10000 ], B=>q[ my $x = scalar keys %b for 1 .. 10000 ] };; Rate B A B 658/s -- -19% A 812/s 24% --

      and therefore less of a bias to the test.

      Do you have a better way to deal with these issues?


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        The least bias would be

        %a = 1..2e6; %b = 1..2e6; cmpthese -1, { A=>q[ scalar keys %a, ++$i for 1 .. 10000 ], B=>q[ scalar keys %b and ++$i for 1 .. 10000 ], C=>q[ my $x = keys %a, ++$i for 1 .. 10000 ], D=>q[ my $x = keys %b and ++$i for 1 .. 10000 ], } __END__ Rate D C A B D 280/s -- -4% -31% -31% C 290/s 4% -- -28% -28% A 404/s 44% 39% -- -1% B 406/s 45% 40% 1% --

        Testing with empty %b gives

        %a = 1..2e6; %b = (); cmpthese -1, { A=>q[ scalar keys %a, ++$i for 1 .. 10000 ], B=>q[ scalar keys %b, ++$i for 1 .. 10000 ], C=>q[ my $x = keys %a, ++$i for 1 .. 10000 ], D=>q[ my $x = keys %b, ++$i for 1 .. 10000 ], } __END__ Rate C D A B C 288/s -- -2% -30% -31% D 293/s 2% -- -28% -29% A 410/s 43% 40% -- -1% B 415/s 44% 41% 1% --

        from which I would conclude that there's a minimal difference between testing an empty and a full hash.

        update: moritz code using these instead of sub {die}:

        use strict; use warnings; use Benchmark qw(cmpthese); for my $len (2..6) { our (%full, %empty); my $a = 'A'; $full{$a++} = $a while length($a) < $len; print "\nLength $len with ", scalar(%full), "items\n"; cmpthese -1, { full => q[scalar keys %full, ++$i for 1 .. 10000], empty => q[scalar keys %empty, ++$i for 1 .. 10000], } } __END__ Length 2 with 22/32items Rate empty full empty 350/s -- -15% full 411/s 17% -- Length 3 with 510/1024items Rate full empty full 406/s -- -2% empty 415/s 2% -- Length 4 with 13961/32768items Rate full empty full 409/s -- -1% empty 414/s 1% -- Length 5 with 310721/524288items Rate empty full empty 414/s -- 0% full 414/s 0% -- Length 6 with 8655839/16777216items Rate empty full empty 407/s -- -4% full 426/s 5% --

        But! If I introduce a third test condition which also tests for %full, I get:

        ... cmpthese -1, { full => q[scalar keys %full, ++$i for 1 .. 10000], empty => q[scalar keys %empty, ++$i for 1 .. 10000], fake => q[scalar keys %full, ++$i for 1 .. 10000], } } __END__ Length 2 with 22/32items Rate fake empty full fake 403/s -- -2% -3% empty 411/s 2% -- -2% full 417/s 4% 2% -- Length 3 with 510/1024items Rate fake full empty fake 415/s -- -1% -1% full 418/s 1% -- -0% empty 419/s 1% 0% -- Length 4 with 13961/32768items Rate full empty fake full 406/s -- -2% -2% empty 414/s 2% -- 0% fake 414/s 2% 0% -- Length 5 with 310721/524288items Rate fake full empty fake 406/s -- -0% -1% full 407/s 0% -- -1% empty 411/s 1% 1% -- Length 6 with 8655839/16777216items Rate fake full empty fake 400/s -- -3% -3% full 411/s 3% -- -0% empty 411/s 3% 0% --

        using a sub instead of a quoted string

        ... cmpthese -1, { full => sub{scalar keys %full, ++$i }, empty => sub{scalar keys %empty, ++$i }, fake => sub{scalar keys %full, ++$i }, } } __END__ Length 2 with 22/32items Rate empty full fake empty 4955017/s -- -4% -5% full 5167154/s 4% -- -1% fake 5215901/s 5% 1% -- Length 3 with 510/1024items Rate full empty fake full 5038923/s -- -4% -5% empty 5265576/s 4% -- -1% fake 5293291/s 5% 1% -- Length 4 with 13961/32768items Rate empty full fake empty 4633858/s -- -1% -5% full 4677165/s 1% -- -4% fake 4858803/s 5% 4% -- Length 5 with 310721/524288items Rate full fake empty full 4812084/s -- -5% -7% fake 5041231/s 5% -- -3% empty 5193418/s 8% 3% -- Length 6 with 8655839/16777216items Rate fake full empty fake 4689731/s -- -4% -16% full 4906438/s 5% -- -12% empty 5604762/s 20% 14% --

        Hmm... let's use moritz's code with strings instead of hashes:

        for my $len (2..6) { our ($full, $empty); my $a = 'A'; $full .= $a while length($full) < $len; print "\nLength $len with ", scalar($full), " items\n"; cmpthese -1, { full => sub { die unless $full }, empty => sub { die if $empty }, fake => sub { die unless $full }, } } __END__ Length 2 with AA items Rate empty full fake empty 29399327/s -- -5% -35% full 31036076/s 6% -- -31% fake 44964665/s 53% 45% -- ...

        It looks like the iteration/sub call overhead is shadowing the scalar %hash lookup, and moritz has been testing something else...

        I'm re-reading No More Meaningless Benchmarks! ;-)