in reply to Re^2: fast lookups in files
in thread fast lookups in files
Can you afford 108 MB of ram?
If so, rewrite your file in binary, packing each KV pair using 'NS'. 18885025 * 6 / 2**20 = 108 MB.
Slurp the entire file into a single string and then use a binary chop on it something along the lines of:
open DATA, '<:raw', $datafile or die $!; my $data; sysread( DATA, $data, -s( $datafile ) ) or die $!; close DATA; sub lookup { my $target = shift; my( $left, $right ) = ( 0, length( $data ) / 6 ); while( $left < $right ) { my $mid = int( ( $left + $right ) / 2 ); my( $key, $val ) = unpack 'NS', substr $data, $mid * 6, 6; if( $key < $target ) { $left = $mid +1; } elsif( $key > $target ) { $right = $mid - 1; } elsif( $key == $target ) { return $val; } else { return; } } }
In a quick test this achieved lookups 12,500 per second. (Let's see them do that with an RDBMS :)
Notes: I would not be surprised if the above contains bugs it was thrown together. My test data was only 10e6 lines, so expect a little slower.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^4: fast lookups in files
by citromatik (Curate) on Feb 05, 2008 at 15:48 UTC | |
by BrowserUk (Patriarch) on Feb 05, 2008 at 16:52 UTC | |
by BrowserUk (Patriarch) on Feb 05, 2008 at 16:06 UTC |