in reply to Help performing "random" access on very very large file
An option you didn't list is: Create an index. ( Oops, I guess that's an implementation of option 4. ) Basically, that's what Tie::File does, but Tie::File builds the index in memory whereas you'll be building it in a file.
There are two advantages: Tie::File would easily require >1GB of memory in your situation, and if the index resides on disk, you build it once and use multiple times, as long as the data file doesn't chage.
Read through the huge file writing the starting locations of every line into another file (in fixed-width binary, such as pack('N2', $high, $low) or pack('Q', $addr)).
Then, you can seek to random (divisible by 4) locations into the index file to know where to seek into the data file.
Note: Perl IO is supposedly quite slow — was this fixed? — so you could write a tiny C program to create the index file since that's pure IO and just as simple to write in C as in Perl.
Note: This would require your Perl, its seek and its tell to handle 64-bit numbers.
|
---|