in reply to How to get fast random access to a large file?
Are you modifying the file?
Have you tried setting the memory parameter when you tie the file? The default is 20MB, increasing this according to how much ram you have may improve performance.
The thing you have to remember is that in order to read the last line of a variable length record file, you *have* to read all the intermediate ones along the way. At least the first time. After that T::F will remember where the lines are, provided remembering doesn't require more than the memory limit specified. Once that memory limit is exhausted, it has to start forgetting things, which then requires re-discovery if you revisit those forgotten lines later.
It takes 128 MB of raw binary storage to remember the offsets of all 33,554,432 32-character lines in a 1 GB file. That's storing the offsets in 4-bytes binary. Tie::File uses a hash to store the offsets, which requires considerably more memory. All of which is my way of saying, Tie::File is very good, but it can't work miracles; and if you are working on files bigger than a couple of hundred MB, you must increase the memory parameter value.
If you are modifying the lines, that will slow things down. A lot if you are modifying randomly throughout the file.
Also, you can construct your own index file for the record offsets quite easily. It means you can use substantially less ram for the index overhead and still achieve very fast random access. It takes a bit of work, but if your interested /msg me.
|
|---|