Can you afford 108 MB of ram?
If so, rewrite your file in binary, packing each KV pair using 'NS'. 18885025 * 6 / 2**20 = 108 MB.
Slurp the entire file into a single string and then use a binary chop on it something along the lines of:
open DATA, '<:raw', $datafile or die $!;
my $data;
sysread( DATA, $data, -s( $datafile ) ) or die $!;
close DATA;
sub lookup {
my $target = shift;
my( $left, $right ) = ( 0, length( $data ) / 6 );
while( $left < $right ) {
my $mid = int( ( $left + $right ) / 2 );
my( $key, $val ) = unpack 'NS', substr $data, $mid * 6, 6;
if( $key < $target ) {
$left = $mid +1;
}
elsif( $key > $target ) {
$right = $mid - 1;
}
elsif( $key == $target ) {
return $val;
}
else {
return;
}
}
}
In a quick test this achieved lookups 12,500 per second. (Let's see them do that with an RDBMS :)
Notes: I would not be surprised if the above contains bugs it was thrown together. My test data was only 10e6 lines, so expect a little slower.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
|