in reply to Re^4: Using indexing for faster lookup in large file
in thread Using indexing for faster lookup in large file
Thanks for doing that erix.
'Lookup averaged 0.012486 seconds/record
Hm. Disappointed with that. I suspect a good deal of that time is down to writing the 1000 found records to the disk.
I suspect that if you commented out the print of the records and reran it, it'd be more in line with the numbers I get here:
for my $i ( 1 .. $N ) { my $rndRec = 1 + int rand( 160e6 ); # printf "Record $rndRec: "; my $pos = binsearch( \$idx, $rndRec ); if( $pos ) { seek DATA, $pos, 0; # printf "'%s'", scalar <DATA>; }
The first number is the time taken to load the index. The second run is with a warm cache:
E:\>c:\test\1118102-searcher e:30GB.dat e:30GB.idx 16.8919820785522 Lookup averaged 0.009681803 seconds/record E:\>c:\test\1118102-searcher e:30GB.dat e:30GB.idx 4.17907309532166 Lookup averaged 0.009416031 seconds/record
Of course, if I run it on an SSD, it looks much nicer, especially as the cache warms up:
E:\>c:\test\1118102-searcher s:30GB.dat s:30GB.idx 33.1236040592194 Lookup averaged 0.000902344 seconds/record E:\>c:\test\1118102-searcher s:30GB.dat s:30GB.idx 3.44389009475708 Lookup averaged 0.000789429 seconds/record E:\>c:\test\1118102-searcher s:30GB.dat s:30GB.idx 4.35790991783142 Lookup averaged 0.000551061 seconds/record E:\>c:\test\1118102-searcher s:30GB.dat s:30GB.idx 3.86181402206421 Lookup averaged 0.000482989 seconds/record E:\>c:\test\1118102-searcher s:30GB.dat s:30GB.idx 4.66845011711121 Lookup averaged 0.000458750 seconds/record
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^6: Using indexing for faster lookup in large file
by erix (Prior) on Mar 04, 2015 at 14:16 UTC | |
by BrowserUk (Patriarch) on Mar 04, 2015 at 15:37 UTC |