in reply to Using indexing for faster lookup in large file
A pure perl solution that results in an average lookup time of < 1/2 a millisecond per record.
This indexes the 30GB/160e6 record file in around 45 minutes. (10 lines):
#! perl -sw use strict; open IN, '<', $ARGV[ 0 ] or die $!; open OUT, '>:raw', $ARGV[1] or die $!; my $pos = 0; print( OUT pack 'NQ', m[^(\d+),], $pos ), $pos = tell( IN ) while <IN> +; close OUT; close IN;
And this loads the 2GB index into memory, uses a binary search to find the index entry, seek to locate and readline to read the record. (50 lines)
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Using indexing for faster lookup in large file (PP < 0.0005s/record)
by anli_ (Novice) on Mar 02, 2015 at 18:09 UTC | |
by BrowserUk (Patriarch) on Mar 02, 2015 at 18:40 UTC |