Are you modifying the file?

Have you tried setting the memory parameter when you tie the file? The default is 20MB, increasing this according to how much ram you have may improve performance.

The thing you have to remember is that in order to read the last line of a variable length record file, you *have* to read all the intermediate ones along the way. At least the first time. After that T::F will remember where the lines are, provided remembering doesn't require more than the memory limit specified. Once that memory limit is exhausted, it has to start forgetting things, which then requires re-discovery if you revisit those forgotten lines later.

It takes 128 MB of raw binary storage to remember the offsets of all 33,554,432 32-character lines in a 1 GB file. That's storing the offsets in 4-bytes binary. Tie::File uses a hash to store the offsets, which requires considerably more memory. All of which is my way of saying, Tie::File is very good, but it can't work miracles; and if you are working on files bigger than a couple of hundred MB, you must increase the memory parameter value.

If you are modifying the lines, that will slow things down. A lot if you are modifying randomly throughout the file.

Also, you can construct your own index file for the record offsets quite easily. It means you can use substantially less ram for the index overhead and still achieve very fast random access. It takes a bit of work, but if your interested /msg me.


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
"Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon

In reply to Re: How to get fast random access to a large file? by BrowserUk
in thread How to get fast random access to a large file? by gothic_mallard

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.