Re: What is the most efficient way to see if a hash is empty?

Hm. scalar keys %hash does not seem to be O(N):

[0] Perl> %a = 1..1e6;;
[0] Perl> %b = ();;
[0] Perl> cmpthese -1, { 
    A=>q[ scalar keys %a and ++$i for 1 .. 1000 ], 
    B=>q[ scalar keys %b and ++$j for 1 .. 1000 ] 
};;
    Rate    A    B
A 8354/s   -- -15%
B 9845/s  18%   --

[0] Perl> %b = (1..2e6);;
[0] Perl> cmpthese -1, { 
    A=>q[ scalar keys %a and ++$i for 1 .. 1000 ], 
    B=>q[ scalar keys %b and ++$j for 1 .. 1000 ] 
};;
   Rate   A   B
A 821/s  -- -0%
B 824/s  0%  --

[0] Perl> %a = ();;
[0] Perl> cmpthese -1, { 
    A=>q[ scalar keys %a and ++$i for 1 .. 1000 ], 
    B=>q[ scalar keys %b and ++$j for 1 .. 1000 ] 
};;
   Rate    B    A
B 815/s   -- -18%
A 998/s  23%   --

[0] Perl> %a = (1,2);;
[0] Perl> cmpthese -1, { 
    A=>q[ scalar keys %a and ++$i for 1 .. 1000 ], 
    B=>q[ scalar keys %b and ++$j for 1 .. 1000 ] 
};;
   Rate   A   B
A 843/s  -- -1%
B 849/s  1%  --

[0] Perl> cmpthese -1, { 
    A=>q[ scalar keys %a and ++$i for 1 .. 1000 ], 
    B=>q[ scalar keys %b and ++$j for 1 .. 1000 ] 
};;
   Rate   B   A
B 804/s  -- -3%
A 831/s  3%  --
[download]

I'll let you draw your own conclusions, but mine are that if the hash is completely empty it is slightly faster than if it has any keys. But whether it has 1 key or 1 million make no difference at all.

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

Comment on Re: What is the most efficient way to see if a hash is empty? Select or Download Code

Replies are listed 'Best First'.
Re^2: What is the most efficient way to see if a hash is empty? by shmem (Chancellor) on Apr 28, 2009 at 09:23 UTC
I'll let you draw your own conclusions I would conclude that your benchmark may be flawed since for empty hashes the `$j++` operation is skipped. Maybe that's the reason for the difference?	[reply] [d/l]
Re^3: What is the most efficient way to see if a hash is empty? by BrowserUk (Patriarch) on Apr 28, 2009 at 09:41 UTC
I put the `and ++$i` there to stop the `scalar keys %hash` being optimised away for being in a void context: `[0] Perl> %a = 1..2e6; %b = 1..2e6;; [0] Perl> cmpthese -1, { A=>q[ scalar keys %a and ++$i for 1 .. 10000 ], B=>q[ scalar keys %b for 1 .. 10000 ] };; Rate A B A 804/s -- -26% B 1087/s 35% --` [download] A problem I've been bitten by with other useless statements in void contexts. And because it is far cheaper than an assignment: `[0] Perl> cmpthese -1, { A=>q[ scalar keys %a and ++$i for 1 .. 10000 ], B=>q[ my $x = scalar keys %b for 1 .. 10000 ] };; Rate B A B 658/s -- -19% A 812/s 24% --` [download] and therefore less of a bias to the test. Do you have a better way to deal with these issues? Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l] [select]
Re^4: What is the most efficient way to see if a hash is empty? by shmem (Chancellor) on Apr 28, 2009 at 15:42 UTC
The least bias would be `%a = 1..2e6; %b = 1..2e6; cmpthese -1, { A=>q[ scalar keys %a, ++$i for 1 .. 10000 ], B=>q[ scalar keys %b and ++$i for 1 .. 10000 ], C=>q[ my $x = keys %a, ++$i for 1 .. 10000 ], D=>q[ my $x = keys %b and ++$i for 1 .. 10000 ], } __END__ Rate D C A B D 280/s -- -4% -31% -31% C 290/s 4% -- -28% -28% A 404/s 44% 39% -- -1% B 406/s 45% 40% 1% --` [download] Testing with empty `%b` gives `%a = 1..2e6; %b = (); cmpthese -1, { A=>q[ scalar keys %a, ++$i for 1 .. 10000 ], B=>q[ scalar keys %b, ++$i for 1 .. 10000 ], C=>q[ my $x = keys %a, ++$i for 1 .. 10000 ], D=>q[ my $x = keys %b, ++$i for 1 .. 10000 ], } __END__ Rate C D A B C 288/s -- -2% -30% -31% D 293/s 2% -- -28% -29% A 410/s 43% 40% -- -1% B 415/s 44% 41% 1% --` [download] from which I would conclude that there's a minimal difference between testing an empty and a full hash. update: moritz code using these instead of `sub {die}`: use strict; use warnings; use Benchmark qw(cmpthese); for my $len (2..6) { our (%full, %empty); my $a = 'A'; $full{$a++} = $a while length($a) < $len; print "\nLength $len with ", scalar(%full), "items\n"; cmpthese -1, { full => q[scalar keys %full, ++$i for 1 .. 10000], empty => q[scalar keys %empty, ++$i for 1 .. 10000], } } __END__ Length 2 with 22/32items Rate empty full empty 350/s -- -15% full 411/s 17% -- Length 3 with 510/1024items Rate full empty full 406/s -- -2% empty 415/s 2% -- Length 4 with 13961/32768items Rate full empty full 409/s -- -1% empty 414/s 1% -- Length 5 with 310721/524288items Rate empty full empty 414/s -- 0% full 414/s 0% -- Length 6 with 8655839/16777216items Rate empty full empty 407/s -- -4% full 426/s 5% -- [download] But! If I introduce a third test condition which also tests for `%full`, I get: ... cmpthese -1, { full => q[scalar keys %full, ++$i for 1 .. 10000], empty => q[scalar keys %empty, ++$i for 1 .. 10000], fake => q[scalar keys %full, ++$i for 1 .. 10000], } } __END__ Length 2 with 22/32items Rate fake empty full fake 403/s -- -2% -3% empty 411/s 2% -- -2% full 417/s 4% 2% -- Length 3 with 510/1024items Rate fake full empty fake 415/s -- -1% -1% full 418/s 1% -- -0% empty 419/s 1% 0% -- Length 4 with 13961/32768items Rate full empty fake full 406/s -- -2% -2% empty 414/s 2% -- 0% fake 414/s 2% 0% -- Length 5 with 310721/524288items Rate fake full empty fake 406/s -- -0% -1% full 407/s 0% -- -1% empty 411/s 1% 1% -- Length 6 with 8655839/16777216items Rate fake full empty fake 400/s -- -3% -3% full 411/s 3% -- -0% empty 411/s 3% 0% -- [download] using a sub instead of a quoted string ... cmpthese -1, { full => sub{scalar keys %full, ++$i }, empty => sub{scalar keys %empty, ++$i }, fake => sub{scalar keys %full, ++$i }, } } __END__ Length 2 with 22/32items Rate empty full fake empty 4955017/s -- -4% -5% full 5167154/s 4% -- -1% fake 5215901/s 5% 1% -- Length 3 with 510/1024items Rate full empty fake full 5038923/s -- -4% -5% empty 5265576/s 4% -- -1% fake 5293291/s 5% 1% -- Length 4 with 13961/32768items Rate empty full fake empty 4633858/s -- -1% -5% full 4677165/s 1% -- -4% fake 4858803/s 5% 4% -- Length 5 with 310721/524288items Rate full fake empty full 4812084/s -- -5% -7% fake 5041231/s 5% -- -3% empty 5193418/s 8% 3% -- Length 6 with 8655839/16777216items Rate fake full empty fake 4689731/s -- -4% -16% full 4906438/s 5% -- -12% empty 5604762/s 20% 14% -- [download] Hmm... let's use moritz's code with strings instead of hashes: `for my $len (2..6) { our ($full, $empty); my $a = 'A'; $full .= $a while length($full) < $len; print "\nLength $len with ", scalar($full), " items\n"; cmpthese -1, { full => sub { die unless $full }, empty => sub { die if $empty }, fake => sub { die unless $full }, } } __END__ Length 2 with AA items Rate empty full fake empty 29399327/s -- -5% -35% full 31036076/s 6% -- -31% fake 44964665/s 53% 45% -- ...` [download] It looks like the iteration/sub call overhead is shadowing the scalar %hash lookup, and moritz has been testing something else... I'm re-reading No More Meaningless Benchmarks! ;-)	[reply] [d/l] [select]
Re^5: What is the most efficient way to see if a hash is empty? by BrowserUk (Patriarch) on Apr 28, 2009 at 19:59 UTC