in reply to Index a file with pack for fast access
This creates an index file with the '.idx' appended to the name of the input file:
#! perl -slw use strict; open INDEX, '>:raw', "$ARGV[ 0 ].idx" or die $!; syswrite INDEX, pack( 'N', 0 ), 4; syswrite INDEX, pack( 'N', tell *ARGV ), 4 while <>; close INDEX;
And this loads the appropriate index file for its input argument and the reads 100 records at random:
#! perl -slw use strict; use Time::HiRes qw[ time ]; our $N //= 100; open INDEX, '<:raw', "$ARGV[ 0 ].idx" or die $!; my $len = -s( INDEX ); sysread INDEX, my( $idx ), $len; close INDEX; my $start = time; open DAT, '<', $ARGV[ 0 ] or die $!; for( 1 .. $N ) { my $toRead = int rand( length( $idx ) / 4 ); my $offset = unpack 'N', substr $idx, $toRead * 4, 4; seek DAT, $offset, 0; my $line = <DAT>; # print $line; } close DAT; printf "Ave. %.6f seconds/record\n", ( time() -$start ) / $N;
And here is a console log with timings of indexing a 1gb file containing 16 million records and then reading a 100 records at random via that index:
[23:03:42.25] c:\test>indexFile 1GB.csv [23:05:08.24] c:\test>readIndexedFile 1GB.csv Ave. 0.003699 seconds/record [23:05:40.38] c:\test>readIndexedFile 1GB.csv Ave. 0.003991 seconds/record
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Index a file with pack for fast access
by Ineffectual (Scribe) on Dec 20, 2011 at 23:22 UTC | |
by BrowserUk (Patriarch) on Dec 20, 2011 at 23:35 UTC | |
by Ineffectual (Scribe) on Dec 21, 2011 at 17:31 UTC | |
by BrowserUk (Patriarch) on Dec 21, 2011 at 17:51 UTC | |
by Ineffectual (Scribe) on Dec 21, 2011 at 18:56 UTC | |
|