in reply to Re^5: Finding the size of a nested hash in a HoH
in thread Finding the size of a nested hash in a HoH

Output shows these are same.

This warning is important warning: too few iterations for a reliable count

You never actually benchmark with_keys or with_each, you benchmark their return values intead

Consider

#!/usr/bin/perl -- use strict; use warnings; { use Benchmark; print "##########\n"; print scalar gmtime, "\n"; print "perl $] \n"; for my $iterKeys ( [ -3, 100_000], [ 10, 1_000_000 ] ){ my $count = $iterKeys->[1]; my %hash = map { $_ => !!0 } 0 .. $count; print "count $count\n"; Benchmark::cmpthese( $iterKeys->[0], { for_keys => sub { for my $k( keys %hash ){ } return; }, while_each => sub { while( my($k,$v) = each %hash ){ } return; }, }, ); print "\n"; } print "\n"; print scalar gmtime, "\n"; print "\n"; } __END__ ########## Fri Nov 11 08:45:57 2011 perl 5.014001 count 100000 Rate while_each for_keys while_each 6.08/s -- -62% for_keys 16.2/s 166% -- count 1000000 s/iter while_each for_keys while_each 1.85 -- -59% for_keys 0.761 143% -- Fri Nov 11 08:47:19 2011

I didn't test past 1 million keys, my machine is ancient, and I didn't feel like swapping, but you can observe a trend -- throwing memory at the problem (for_keys) is faster while it fits in ram, but it will eventually reverse

Replies are listed 'Best First'.
Re^7: Finding the size of a nested hash in a HoH
by remiah (Hermit) on Nov 11, 2011 at 13:43 UTC

    I didn't understand Benchmark. It's pity of me. I should have passed function as reference and didn't need timethese(). With 2 million test and revised my poor code with your advice, the difference becomes smaller but they didn't reverse.

    use strict; use warnings; use Benchmark qw/cmpthese timethese/; my %h; $|=1; sub with_keys{ foreach my $k (keys %h){ #no proc } return; } sub with_each{ while( my($k,$v)=each %h){ #no proc } return; } for my $max ( qw(10000 100000 1000000 2000000) ){ print "\ncase $max\n"; %h=map { $_ => 1 } (1 .. $max); print "hash ready ... count=" . (scalar keys %h) ."\n"; print scalar gmtime, "\n"; print "-" x 40 ."\n"; cmpthese( -20, #20 cpu time { 'test_keys'=> \&with_keys, 'test_each'=> \&with_each, } ); print "-" x 40 ."\n"; print scalar gmtime, "\n"; } __DATA__ case 10000 hash ready ... count=10000 Fri Nov 11 13:27:23 2011 ---------------------------------------- Rate test_each test_keys test_each 109/s -- -61% test_keys 281/s 158% -- ---------------------------------------- Fri Nov 11 13:28:12 2011 case 100000 hash ready ... count=100000 Fri Nov 11 13:28:12 2011 ---------------------------------------- Rate test_each test_keys test_each 6.10/s -- -32% test_keys 8.93/s 46% -- ---------------------------------------- Fri Nov 11 13:29:04 2011 case 1000000 hash ready ... count=1000000 Fri Nov 11 13:29:09 2011 ---------------------------------------- s/iter test_each test_keys test_each 1.88 -- -24% test_keys 1.42 32% -- ---------------------------------------- Fri Nov 11 13:30:05 2011 case 2000000 hash ready ... count=2000000 Fri Nov 11 13:31:05 2011 ---------------------------------------- s/iter test_each test_keys test_each 3.89 -- -24% test_keys 2.96 31% -- ---------------------------------------- Fri Nov 11 13:32:43 2011

    Thanks for your reply.

      The comments from BrowserUks reply apply to my benchmark as well, it is flawed

      In while_each i fetch both key and value, while in for_keys i only betch the keys

      Well, it was a step in the right direction :)

        I was wrong.

        In while_each i fetch both key and value, while in for_keys i only betch the keys

        As you say, this seems to have great effects when looping through million times.