Have you considered using nested arrays rather than nested hashes?
Arrays use far less memory that hashes as they don't need to store the keys. The downside is that they aren't sparse.
But by (say) subtracting 100 from your 3-digit area codes, you eliminate a large chunk of unused space.
Why reparse the csv and reconstruct the db every time?
I did a couple of quick tests.
In the first I construct a HoHs with keys 100..999 and 100..4999 respectively and a 4 char string as the values.
C:\test>826814 Total size in ram: 15132924 1e6 lookups took 1.619 size on disk (excluding blocking): 65340000 Time to construct hash from (ram) file: 14.578
In the second I used the same ranges of numeric keys and the same data. This time I used a single array, indexed by area_code - 100. I then packed the "info" numbers as 16-bit integers, into strings. The strings are 'indexed' (via substr) using the prefix-code - 100 *2. The results are:
C:\test>826814-b Total size in ram: 8871552 1e6 lookups took 0.813 size on disk (excluding blocking): 8822720 Time to construct hash from storable (ram) file: 0.039
That's 1/2 half the memory usage; 1/8th the disk usage; 1/2 the lookup time; and a tiny fraction of the load time. Though that last figure is deceptive because there was no IO involved (in either test).
#! perl -slw use strict; use Time::HiRes qw[ time ]; use Devel::Size qw[ total_size ]; my %db; for my $ac ( 100 ..999 ) { $db{ $ac }{ $_ } = '1234' for 100 .. 4999; } print "Total size in ram: ", total_size( \%db ); my $start = time; for ( 1 .. 1e6 ) { my $info = $db{ 100+int(rand 800) }{ 100+int(rand 4900) }; } printf "1e6 lookups took %.3f\n", time() - $start; my $ram = chr(0) x 6.5e7; open RAM, '+<', \$ram; seek RAM, 0, 0; for my $ac ( 100 .. 999 ) { for my $pre ( 100 .. 4999 ) { printf RAM "$ac,$pre, $db{ $ac }{ $pre }\n"; } } print "size on disk (excluding blocking): ", length $ram; seek RAM, 0, 0; $start = time; my %db2; m[([^,]+),([^,]+),([^,]+)] and $db{ $1 }{ $2 } = $3 while <RAM>; printf "Time to construct hash from csv (ram) file: %.3f\n", time()-$s +tart; __END__ C:\test>826814 Total size in ram: 15132924 1e6 lookups took 1.619 size on disk (excluding blocking): 65340000 Time to construct hash from (ram) file: 14.578
#! perl -slw use strict; use Storable qw[ freeze thaw ]; use Time::HiRes qw[ time ]; use Devel::Size qw[ total_size ]; my @db; for my $ac ( 100 ..999 ) { $db[ $ac - 100 ] = pack 'S*', (1234) x 4899; } print "Total size in ram: ", total_size( \@db ); my $start = time; for ( 1 .. 1e6 ) { my $info = unpack 'S', substr $db[ int( rand 800 ) ], 2*( int( ra +nd 4900 ) ), 2; } printf "1e6 lookups took %.3f\n", time() - $start; my $ram = freeze \@db; print "size on disk (excluding blocking): ", length $ram; $start = time; my @db2 = @{ thaw( $ram ) }; printf "Time to construct hash from storable (ram) file: %.3f\n", time +()-$start; __END__ C:\test>826814-b Total size in ram: 8871552 1e6 lookups took 0.813 size on disk (excluding blocking): 8822720 Time to construct hash from storable (ram) file: 0.039
In reply to Re: large perl module
by BrowserUk
in thread large perl module
by minek
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |