Thanks for doing that erix.
'Lookup averaged 0.012486 seconds/record
Hm. Disappointed with that. I suspect a good deal of that time is down to writing the 1000 found records to the disk.
I suspect that if you commented out the print of the records and reran it, it'd be more in line with the numbers I get here:
for my $i ( 1 .. $N ) { my $rndRec = 1 + int rand( 160e6 ); # printf "Record $rndRec: "; my $pos = binsearch( \$idx, $rndRec ); if( $pos ) { seek DATA, $pos, 0; # printf "'%s'", scalar <DATA>; }
The first number is the time taken to load the index. The second run is with a warm cache:
E:\>c:\test\1118102-searcher e:30GB.dat e:30GB.idx 16.8919820785522 Lookup averaged 0.009681803 seconds/record E:\>c:\test\1118102-searcher e:30GB.dat e:30GB.idx 4.17907309532166 Lookup averaged 0.009416031 seconds/record
Of course, if I run it on an SSD, it looks much nicer, especially as the cache warms up:
E:\>c:\test\1118102-searcher s:30GB.dat s:30GB.idx 33.1236040592194 Lookup averaged 0.000902344 seconds/record E:\>c:\test\1118102-searcher s:30GB.dat s:30GB.idx 3.44389009475708 Lookup averaged 0.000789429 seconds/record E:\>c:\test\1118102-searcher s:30GB.dat s:30GB.idx 4.35790991783142 Lookup averaged 0.000551061 seconds/record E:\>c:\test\1118102-searcher s:30GB.dat s:30GB.idx 3.86181402206421 Lookup averaged 0.000482989 seconds/record E:\>c:\test\1118102-searcher s:30GB.dat s:30GB.idx 4.66845011711121 Lookup averaged 0.000458750 seconds/record
In reply to Re^5: Using indexing for faster lookup in large file
by BrowserUk
in thread Using indexing for faster lookup in large file
by anli_
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |