$Library_Index{<$Library>} = tell($library), scalar <$Library> until e
+of($Library);
I tested the write-the-index-to-disc code with a file containing 17,000 id/record pairs with 300,000 data records (5.2GB).
This creates the index and writes it to disc: #! perl -slw
use strict;
use Storable qw[ store ];
print time;
my %idx;
$idx{ <> } = tell( STDIN ), scalar <> until eof STDIN;
store \%idx, '1031021.idx' or die $!;
print time;
The whole process takes a little over 3 minutes: C:\test>1031021-i.pl <1031021.dat
1367160156
1367160362
C:\test>dir 1031021*
28/04/2013 15:30 193 1031021-i.pl
28/04/2013 15:04 5,272,940,608 1031021.dat
28/04/2013 15:46 316,385 1031021.idx
28/04/2013 15:29 374 1031021.pl
And this code loads that index from disk (<1 second) and the reads 1000 random records (26 seconds) using it: #! perl -slw
use strict;
use Storable qw[ retrieve ];
print time;
my $idx = retrieve '1031021.idx' or die $!;
print time;
open DAT, '+<', '1031021.dat' or die $!;
for( 1 .. 1000 ) {
my( $id, $offset ) = each %$idx;
seek DAT, $offset, 0;
my $vid = <DAT>;
die 'mismatch' unless $id eq $vid;
my $data = <DAT>;
}
close DAT;
print time;
Run: C:\test>1031021.pl
1367160624
1367160624
1367160651
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
|