Can you afford 108 MB of ram?
If so, rewrite your file in binary, packing each KV pair using 'NS'. 18885025 * 6 / 2**20 = 108 MB.
Slurp the entire file into a single string and then use a binary chop on it something along the lines of:
open DATA, '<:raw', $datafile or die $!; my $data; sysread( DATA, $data, -s( $datafile ) ) or die $!; close DATA; sub lookup { my $target = shift; my( $left, $right ) = ( 0, length( $data ) / 6 ); while( $left < $right ) { my $mid = int( ( $left + $right ) / 2 ); my( $key, $val ) = unpack 'NS', substr $data, $mid * 6, 6; if( $key < $target ) { $left = $mid +1; } elsif( $key > $target ) { $right = $mid - 1; } elsif( $key == $target ) { return $val; } else { return; } } }
In a quick test this achieved lookups 12,500 per second. (Let's see them do that with an RDBMS :)
Notes: I would not be surprised if the above contains bugs it was thrown together. My test data was only 10e6 lines, so expect a little slower.
In reply to Re^3: fast lookups in files
by BrowserUk
in thread fast lookups in files
by citromatik
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |