gone2015 has asked for the wisdom of the Perl Monks concerning the following question:
I'm trying to use Devel::Size to get the size of a large hash -- 5 million or so entries, where the keys are random strings of 32 characters plus "\n", and the values are all undef
If I run the code below, which builds a hash of just 1 million entries I get:
: Start : 1st Read : undef : 2nd Read :
Hash size : 0.0 MB : 71.9 MB : 0.0 MB : 71.9 MB :
Total hash : 0.0 MB : 94.8 MB : 0.0 MB : 94.8 MB :
: : : : :
Virtual Memory : 100.9 MB : 391.1 MB : 383.1 MB : 399.1 MB :
Resident : 2.8 MB : 292.9 MB : 284.9 MB : 300.9 MB :
So... having 1st read 31.5 MB of file, the hash is apparently 94.8 MB (71.9 MB given by Devel::Size::size(\%hash) plus 24 * 1E6, where each undef entry is 24 bytes.) This is odd -- 94.8 MB of hash appears to require ~ 290 MB of memory ? I tried commenting out the assignment $hash{$_} = undef, and the loop consumed no memory at all -- so whatever is going on, it's to do with the hash !
When I undef %hash the hash, the memory footprint reduces by just 8 MB (391.1 MB -> 383.1 MB). I cannot tell what free space Perl has, so this may or may not be reasonable.
When the file is read the 2nd time, the memory footprint grows to a bit more than it was after the 1st read. I'm not sure what to make of that.
Finally, if I comment out the call to Devel::Size::size() the System Monitor tells me:
: Start : 1st Read : undef : 2nd Read :
Virtual Memory : 100.9 MB : 270.0 MB : 262.0 MB : 277.9 MB :
Resident : 2.8 MB : 171.8 MB : 163.8 MB : 179.7 MB :
which is similar, except that the footprint is some 120 MB less !!! I don't know what that tells me about the usefulness of Devel::Size ??
I'm enquiring, because once I get to 5 million such strings in the hash, the footprint has grown to 1.0GB or so, and my machine is thrashing when I try to use the hash :-( (and <c>Devel::Size::size() takes longer and longer...)
Help !! (OK, I could go and get some more memory, 1G is not a lot in today's market...)
For completeness: perl, v5.10.0 built for x86_64-linux-thread-multi, and Linux 2.6.25.14-108.fc9.x86_64 #1 SMP Mon Aug 4 13:46:35 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux
use strict ; use warnings ; my %hash = () ; $hash{'dummy'} = undef ; printf STDERR "Single entry requires: %d Bytes\n", bytes($hash{dummy}) + ; read_it() ; undef %hash ; show(0, \%hash) ; wait_for_it('Just "undef"ed the hash') ; read_it() ; sub read_it { open my $FH, "<", "hash.txt" ; wait_for_it('About to read') ; my $e = 0 ; my $c = show($e, \%hash) ; while (<$FH>) { $hash{$_} = undef ; $e++ ; $c = ($c - 1) || show($e, \%hash) ; } ; wait_for_it('Finished Reading') ; } ; sub show { my ($e, $rh) = @_ ; printf STDERR "%8d entries: %3.1fM Bytes\n", $e, mbytes($rh) ; return 50_000 ; } ; sub mbytes { my ($r) = @_ ; bytes($r)/(1024 * 1024) ; } ; use Devel::Size () ; sub bytes { my ($r) = @_ ; return Devel::Size::size($r) ; } ; sub wait_for_it { print STDERR "$_[0]..." ; my $ch = '' ; while ($ch !~ m/\./) { sysread STDIN, $ch, 1 ; } ; } ;
|
|---|