Re^8: speeding up row by row lookup in a large db

Alternatively, trade a little time for a lot of space by loading the met data as a 2D rather than 3D array. Ie. Each element of the second level is a string containing the 9 values, rather than an array containing 9 elements.

This cuts the load time to 8 seconds; the memory usage to a mere 500 MB. It means splitting the column data repeatedly to access the indivual values which slows it down, but still allows the processing of the million cells in around 6 seconds:

#! perl -slw
use strict;
use Storable;
use Time::HiRes qw[ time ];

our $N ||= 1e3;

my $start1 = time;

my @cells;
open CELLS, '<', 'cells.dat' or die $!;
m[(\d+)\s+(\d+)] and $cells[ $1 ] = $2 while <CELLS>;
close CELLS;


my @met = [];

for my $met ( 1 .. 400 ) {
    open IN,  '<', sprintf "met%04d.dat", $met or die "dat $met : $!";
    local $/;
    my @data; $#data = 7300;
    @data = map{ split "\n" } <IN>;
    close IN;
    push @met, \@data;
}


printf 'All data loaded in %.2f seconds', time() - $start1;

<>;

my $start2 = time;
for my $cell ( 1 .. $N ) {
    my $row = int rand 7300;
    my $col = int rand 9;
    my $rowData = $met[ $cells[ $cell ] ][ $row ];
    my $value = (split ', ', $rowData)[ $col ]
}

printf "Accessed $N met datasets at a rate of %.2f\n",
    $N / ( time - $start2 );

__END__
c:\test\752472>752472 -N=1e6
All data loaded in 8.93 seconds
Accessed 1e6 met datasets at a rate of 165098.24
[download]

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

Comment on Re^8: speeding up row by row lookup in a large db Download Code