I think you've been given a lot of irrelevant information that completely misses the answer you are seeking [sic].

time consumption of perl's seek is related with file size, or it's a constant time operation?

For the purposes of random access, seeking is an amortised, constant time operation, regardless of the size of the file.

That is, all else -- hardware, system buffers, OS, system load, file fragmentation etc. -- being equal, reading the first and then the last records will take the same time as reading the last then then first, regardless of whether the file is a few 10s of megabytes or a few tens of gigabytes.

Is it a costly operation?

It depends upon the amount and distribution of the records you must access. If you were to read the file's entire contents in a random order, it will be significantly slower than reading that same content sequentially.

If however, for any given run or time period, you only need a few records from the entire file, seeking to them and reading just those you need, is much quicker than reading through the entire file to get to them.

Finally, there are various things that can be done to optimise your seek/read times. Primary amongst these is to ensure that your file is contiguous on disk. Second, if at all possible, ensure that records do not span blocks.

It used to be the case -- where the algorithm allowed -- that deferring reads and re-ordering them could reduce head movement and improve throughput, but most disk controllers do this as a matter of course now.

The biggest problem with trying to optimise random access is accounting for the effects of other processes accessing the same file-system and/or hardware.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

In reply to Re: Efficiency of seek by BrowserUk
in thread Efficiency of seek by llancet

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.