Re^5: Finding the size of a nested hash in a HoH

It may be not good to have large hash which takes much memory. Sometimes I met such case with my poor program,I am interested in this topic.

As a result, "keys" seems faster and smaller than "each" on FreeBSD 8.2, perl 5.12.2. This is first time I use GTop, so there may be something wrong with my test script. In such case, let me know about it, please.

this is perl script with GTop and HiRes.

#!/usr/bin/perl
use strict;
use warnings;
use GTop();
use Time::HiRes;

my($gtop,$max,%h,@t);

$gtop=new GTop;
$t[0]=Time::HiRes::time();

printf "###count=$ARGV[1],$ARGV[0]###\n";
p("before");
$max=$ARGV[1];
%h=map { $_ => "test" } (1 .. $max);

p("after hash");

if ($ARGV[0] eq "keys"){
    &with_keys();
} elsif($ARGV[0] eq "each") {
    &with_each();
} else {
    print "else\n";
}

p("after loop");

$t[1]=Time::HiRes::time();
printf "time=%.3f\n", ($t[1]-$t[0]);

exit;
[download]

And shell script to kick this.

perl tmp.pl keys 10000 >  log
perl tmp.pl each 10000 >> log
perl tmp.pl keys 100000 >>  log
perl tmp.pl each 100000 >> log
perl tmp.pl keys 1000000 >> log
perl tmp.pl each 1000000 >> log
perl tmp.pl keys 2000000 >> log
perl tmp.pl each 2000000 >> log
[download]

Result log is as below.

###count=10000,keys###
before: size=10035200,vsize=10035200,resident=5500928,share=5143114,rs
+s=5500928
after hash: size=13180928,vsize=13180928,resident=8306688,share=514311
+4,rss=8306688
after loop: size=13180928,vsize=13180928,resident=8376320,share=514311
+4,rss=8376320
time=0.043
###count=10000,each###
before: size=10035200,vsize=10035200,resident=5541888,share=5143114,rs
+s=5541888
after hash: size=13180928,vsize=13180928,resident=8347648,share=514311
+4,rss=8347648
after loop: size=13180928,vsize=13180928,resident=8417280,share=514311
+4,rss=8417280
time=0.050
###count=100000,keys###
before: size=10035200,vsize=10035200,resident=5541888,share=5143114,rs
+s=5541888
after hash: size=43589632,vsize=43589632,resident=39514112,share=51431
+14,rss=39514112
after loop: size=43589632,vsize=43589632,resident=39514112,share=51431
+14,rss=39514112
time=0.689
###count=100000,each###
before: size=10035200,vsize=10035200,resident=5541888,share=5143114,rs
+s=5541888
after hash: size=43589632,vsize=43589632,resident=39514112,share=51431
+14,rss=39514112
after loop: size=43589632,vsize=43589632,resident=39514112,share=51431
+14,rss=39514112
time=0.799
###count=1000000,keys###
before: size=10035200,vsize=10035200,resident=5545984,share=5143114,rs
+s=5545984
after hash: size=296296448,vsize=296296448,resident=282710016,share=51
+43114,rss=282710016
after loop: size=297345024,vsize=297345024,resident=282718208,share=51
+43114,rss=282718208
time=7.389
###count=1000000,each###
before: size=10035200,vsize=10035200,resident=5545984,share=5143114,rs
+s=5545984
after hash: size=296296448,vsize=296296448,resident=282710016,share=51
+43114,rss=282710016
after loop: size=297345024,vsize=297345024,resident=282718208,share=51
+43114,rss=282718208
time=8.522
###count=2000000,keys###
before: size=10035200,vsize=10035200,resident=5545984,share=5143114,rs
+s=5545984
after hash: size=582557696,vsize=582557696,resident=354185216,share=51
+43114,rss=354185216
after loop: size=583606272,vsize=583606272,resident=360177664,share=51
+43114,rss=360177664
time=103.454
###count=2000000,each###
before: size=10035200,vsize=10035200,resident=5484544,share=5143114,rs
+s=5484544
after hash: size=582557696,vsize=582557696,resident=359972864,share=51
+43114,rss=359972864
after loop: size=583606272,vsize=583606272,resident=352419840,share=51
+43114,rss=352419840
time=268.264
[download]

After keys counts exceed million, it seems iteration needs some extra memory, and memory consumption is same between "keys" and "each". "keys" becomes very fast in 2 million case. But Benchmark shows different result.

#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
use Benchmark qw/cmpthese timethese/;

my($max,%h);
$max=1000000;
%h=map { $_ => "test" } (1 .. $max);
sub with_keys{
    foreach my $k (keys %h){
        #no proc
    }
}
sub with_each{
    while( my($k,$v)=each %h){
        #no proc
    }
}

cmpthese(
    timethese(
        100,
        {
            'with keys'=> &with_keys,
            'with each'=> &with_each,
        }
    )
);
[download]

Output shows these are same.

Benchmark: timing 100 iterations of with each, with keys...
 with each:  0 wallclock secs ( 0.00 usr +  0.00 sys =  0.00 CPU)
            (warning: too few iterations for a reliable count)
 with keys:  0 wallclock secs ( 0.00 usr +  0.00 sys =  0.00 CPU)
            (warning: too few iterations for a reliable count)
                          Rate with each with keys
with each 100000000000000000/s        --        0%
with keys 100000000000000000/s        0%        --
[download]

I left my pc alone while running my test scripts...I believe. Would someone give me insights about these result? I wonder why?

Comment on Re^5: Finding the size of a nested hash in a HoH Select or Download Code

Replies are listed 'Best First'.
Re^6: Finding the size of a nested hash in a HoH by BrowserUk (Patriarch) on Nov 11, 2011 at 08:24 UTC
Your test methodology seems broken to me. In the first script you are calling subroutines `with_kets()` and `with_each()` but they do not appear in the script? In the second script, if the keys test moves the process into swapping, then the each test will be equally affected. And the output of GTop seems very muddled to me. It is easy to verify that keys in a for loop creates a list of the keys (which there for consumes extra memory) by running this: `perl -E"my %h = 1..100; for my $k(keys %h){ undef %h; say scalar %h; s +ay $k }"` [download] It constructs a hash, enters a for loop using keys, and then undefs the hash and display its (zero) on the first (and every) iteration. The for loop iterates through all the keys despite that the hash has been emptied. Therefore a separate list of the keys must have been constructed. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l] [select]
Re^7: Finding the size of a nested hash in a HoH by remiah (Hermit) on Nov 11, 2011 at 10:27 UTC
I forgot to paste functions! I saw only process's memory size, I cut other outputs of GTop. #!/usr/bin/perl use strict; use warnings; use GTop(); use Time::HiRes; my($gtop,$max,%h,@t); $gtop=new GTop; $t[0]=Time::HiRes::time(); printf "###count=$ARGV[1],$ARGV[0]###\n"; p("before"); $max=$ARGV[1]; %h=map { $_ => "test" } (1 .. $max); p("after hash"); if ($ARGV[0] eq "keys"){ &with_keys(); } elsif($ARGV[0] eq "each") { &with_each(); } else { print "else\n"; } p("after loop"); $t[1]=Time::HiRes::time(); printf "time=%.3f\n", ($t[1]-$t[0]); sub p { my ($cap)=@_; my $m=$gtop->proc_mem($$); #printf "$cap: size=%s,vsize=%s,resident=%s,share=%s,rss=%s\n" # ,$m->size,$m->vsize,$m->resident,$m->share,$m->rss; printf "%-10s: size=%s\n",$cap,$m->size; } sub with_keys{ foreach my $k (keys %h){ #no proc } } sub with_each{ while( my($k,$v)=each %h){ #no proc } } [download] Output comes like this. ###count=10000,keys### before : size=8708096 after hash: size=11853824 after loop: size=11853824 time=0.044 ###count=10000,each### before : size=8708096 after hash: size=11853824 after loop: size=11853824 time=0.050 ###count=100000,keys### before : size=8708096 after hash: size=41213952 after loop: size=41213952 time=0.682 ###count=100000,each### before : size=8708096 after hash: size=41213952 after loop: size=41213952 time=0.805 ###count=1000000,keys### before : size=8708096 after hash: size=294969344 after loop: size=296017920 time=7.568 ###count=1000000,each### before : size=8708096 after hash: size=294969344 after loop: size=296017920 time=8.563 ###count=2000000,keys### before : size=8708096 after hash: size=581230592 after loop: size=582279168 time=104.976 ###count=2000000,each### before : size=8708096 after hash: size=581230592 after loop: size=582279168 time=225.191 [download] perl -E"my %h = 1..100; for my $k(keys %h){ undef %h; say scalar %h; s +ay $k }" I understood what this means. Count shows 0 but keys exists. It means there is separate list. I wonder why GTop's memory shows exactly same size between foreach and while? I mean "after loop: size" value in my output. As for second shell script, I will change it to first "each" and next "keys" test. And I'll add some sleep statment for swap problem. It takes some time, so I'll report it later. Thanks a lot for reply.	[reply] [d/l] [select]
Re^8: Finding the size of a nested hash in a HoH by remiah (Hermit) on Nov 11, 2011 at 11:53 UTC
I changed shell script like this. `perl 025-1.pl each 10000 > log sleep 10; perl 025-1.pl keys 10000 >> log sleep 10; perl 025-1.pl each 100000 >> log sleep 10; perl 025-1.pl keys 100000 >> log sleep 10; perl 025-1.pl each 1000000 >> log sleep 10; perl 025-1.pl keys 1000000 >> log sleep 10; perl 025-1.pl each 2000000 >> log sleep 10; perl 025-1.pl keys 2000000 >> log` [download] And result seems to show me "each" is slower than "keys", especially when hash becomes larger. ###count=10000,each### before : size=8708096 after hash: size=11853824 after loop: size=11853824 time=0.051 ###count=10000,keys### before : size=8708096 after hash: size=11853824 after loop: size=11853824 time=0.043 ###count=100000,each### before : size=8708096 after hash: size=41213952 after loop: size=41213952 time=0.791 ###count=100000,keys### before : size=8708096 after hash: size=41213952 after loop: size=41213952 time=0.680 ###count=1000000,each### before : size=8708096 after hash: size=294969344 after loop: size=296017920 time=8.561 ###count=1000000,keys### before : size=8708096 after hash: size=294969344 after loop: size=296017920 time=7.429 ###count=2000000,each### before : size=8708096 after hash: size=581230592 after loop: size=582279168 time=309.887 ###count=2000000,keys### before : size=8708096 after hash: size=581230592 after loop: size=582279168 time=99.701 [download] My question is 1. Why Gtop says while loop and foreach loop consume exact same memory? 2. Why foreach/keys loop faster in this example? As another person kindly pointed me, Benchmark shows while/each loop is faster. updated I didn't understand the output of Benchmark... Benchmark also says foreach/keys loop is faster. But question remains. Why?	[reply] [d/l] [select]
Re^9: Finding the size of a nested hash in a HoH by BrowserUk (Patriarch) on Nov 11, 2011 at 20:26 UTC
Re^10: Finding the size of a nested hash in a HoH by remiah (Hermit) on Nov 12, 2011 at 01:56 UTC
Some notes below your chosen depth have not been shown here
Re^6: Finding the size of a nested hash in a HoH by Anonymous Monk on Nov 11, 2011 at 08:52 UTC
Output shows these are same. This warning is important warning: too few iterations for a reliable count You never actually benchmark with_keys or with_each, you benchmark their return values intead Consider #!/usr/bin/perl -- use strict; use warnings; { use Benchmark; print "##########\n"; print scalar gmtime, "\n"; print "perl $] \n"; for my $iterKeys ( [ -3, 100_000], [ 10, 1_000_000 ] ){ my $count = $iterKeys->[1]; my %hash = map { $_ => !!0 } 0 .. $count; print "count $count\n"; Benchmark::cmpthese( $iterKeys->[0], { for_keys => sub { for my $k( keys %hash ){ } return; }, while_each => sub { while( my($k,$v) = each %hash ){ } return; }, }, ); print "\n"; } print "\n"; print scalar gmtime, "\n"; print "\n"; } __END__ ########## Fri Nov 11 08:45:57 2011 perl 5.014001 count 100000 Rate while_each for_keys while_each 6.08/s -- -62% for_keys 16.2/s 166% -- count 1000000 s/iter while_each for_keys while_each 1.85 -- -59% for_keys 0.761 143% -- Fri Nov 11 08:47:19 2011 [download] I didn't test past 1 million keys, my machine is ancient, and I didn't feel like swapping, but you can observe a trend -- throwing memory at the problem (for_keys) is faster while it fits in ram, but it will eventually reverse	[reply] [d/l]
Re^7: Finding the size of a nested hash in a HoH by remiah (Hermit) on Nov 11, 2011 at 13:43 UTC
I didn't understand Benchmark. It's pity of me. I should have passed function as reference and didn't need timethese(). With 2 million test and revised my poor code with your advice, the difference becomes smaller but they didn't reverse. use strict; use warnings; use Benchmark qw/cmpthese timethese/; my %h; $\|=1; sub with_keys{ foreach my $k (keys %h){ #no proc } return; } sub with_each{ while( my($k,$v)=each %h){ #no proc } return; } for my $max ( qw(10000 100000 1000000 2000000) ){ print "\ncase $max\n"; %h=map { $_ => 1 } (1 .. $max); print "hash ready ... count=" . (scalar keys %h) ."\n"; print scalar gmtime, "\n"; print "-" x 40 ."\n"; cmpthese( -20, #20 cpu time { 'test_keys'=> \&with_keys, 'test_each'=> \&with_each, } ); print "-" x 40 ."\n"; print scalar gmtime, "\n"; } __DATA__ case 10000 hash ready ... count=10000 Fri Nov 11 13:27:23 2011 ---------------------------------------- Rate test_each test_keys test_each 109/s -- -61% test_keys 281/s 158% -- ---------------------------------------- Fri Nov 11 13:28:12 2011 case 100000 hash ready ... count=100000 Fri Nov 11 13:28:12 2011 ---------------------------------------- Rate test_each test_keys test_each 6.10/s -- -32% test_keys 8.93/s 46% -- ---------------------------------------- Fri Nov 11 13:29:04 2011 case 1000000 hash ready ... count=1000000 Fri Nov 11 13:29:09 2011 ---------------------------------------- s/iter test_each test_keys test_each 1.88 -- -24% test_keys 1.42 32% -- ---------------------------------------- Fri Nov 11 13:30:05 2011 case 2000000 hash ready ... count=2000000 Fri Nov 11 13:31:05 2011 ---------------------------------------- s/iter test_each test_keys test_each 3.89 -- -24% test_keys 2.96 31% -- ---------------------------------------- Fri Nov 11 13:32:43 2011 [download] Thanks for your reply.	[reply] [d/l]
Re^8: Finding the size of a nested hash in a HoH by Anonymous Monk on Nov 12, 2011 at 02:11 UTC
The comments from BrowserUks reply apply to my benchmark as well, it is flawed In while_each i fetch both key and value, while in for_keys i only betch the keys Well, it was a step in the right direction :)	[reply]
Re^9: Finding the size of a nested hash in a HoH by remiah (Hermit) on Nov 12, 2011 at 02:33 UTC