in reply to Re^7: speeding up row by row lookup in a large db
in thread speeding up row by row lookup in a large db
Alternatively, trade a little time for a lot of space by loading the met data as a 2D rather than 3D array. Ie. Each element of the second level is a string containing the 9 values, rather than an array containing 9 elements.
This cuts the load time to 8 seconds; the memory usage to a mere 500 MB. It means splitting the column data repeatedly to access the indivual values which slows it down, but still allows the processing of the million cells in around 6 seconds:
#! perl -slw use strict; use Storable; use Time::HiRes qw[ time ]; our $N ||= 1e3; my $start1 = time; my @cells; open CELLS, '<', 'cells.dat' or die $!; m[(\d+)\s+(\d+)] and $cells[ $1 ] = $2 while <CELLS>; close CELLS; my @met = []; for my $met ( 1 .. 400 ) { open IN, '<', sprintf "met%04d.dat", $met or die "dat $met : $!"; local $/; my @data; $#data = 7300; @data = map{ split "\n" } <IN>; close IN; push @met, \@data; } printf 'All data loaded in %.2f seconds', time() - $start1; <>; my $start2 = time; for my $cell ( 1 .. $N ) { my $row = int rand 7300; my $col = int rand 9; my $rowData = $met[ $cells[ $cell ] ][ $row ]; my $value = (split ', ', $rowData)[ $col ] } printf "Accessed $N met datasets at a rate of %.2f\n", $N / ( time - $start2 ); __END__ c:\test\752472>752472 -N=1e6 All data loaded in 8.93 seconds Accessed 1e6 met datasets at a rate of 165098.24
|
---|