in reply to Efficiency of seek

I think you've been given a lot of irrelevant information that completely misses the answer you are seeking [sic].

time consumption of perl's seek is related with file size, or it's a constant time operation?

For the purposes of random access, seeking is an amortised, constant time operation, regardless of the size of the file.

That is, all else -- hardware, system buffers, OS, system load, file fragmentation etc. -- being equal, reading the first and then the last records will take the same time as reading the last then then first, regardless of whether the file is a few 10s of megabytes or a few tens of gigabytes.

Is it a costly operation?

It depends upon the amount and distribution of the records you must access. If you were to read the file's entire contents in a random order, it will be significantly slower than reading that same content sequentially.

If however, for any given run or time period, you only need a few records from the entire file, seeking to them and reading just those you need, is much quicker than reading through the entire file to get to them.

Finally, there are various things that can be done to optimise your seek/read times. Primary amongst these is to ensure that your file is contiguous on disk. Second, if at all possible, ensure that records do not span blocks.

It used to be the case -- where the algorithm allowed -- that deferring reads and re-ordering them could reduce head movement and improve throughput, but most disk controllers do this as a matter of course now.

The biggest problem with trying to optimise random access is accounting for the effects of other processes accessing the same file-system and/or hardware.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^2: Efficiency of seek
by salva (Canon) on Sep 01, 2011 at 07:18 UTC
    that deferring reads and re-ordering them could reduce head movement and improve throughput, but most disk controllers do this as a matter of course now
    That is impossible unless the application uses some kind of asynchronous IO in order to queue several read operations and let the OS/hardware handle them in some order that minimizes head movement delays.

    But for common applications using synchronous IO, the app will ask for some data, wait for the OS to fulfill the request, ask for some other data, etc., so there is no way for the OS to reorder the operations.

    Another matter is considering a multithreaded/multiprocess environment where the OS/hardware can reorder the IO operations queued at any point in time by all the running threads.

      That is impossible ...

      Look up the SCAN and C-SCAN algorithms.

      Start here...


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        SCAN, C-SCAN and for that matter any other algorithm (besides those using a crystal ball to see into the future) are useless for mono-threaded applications doing synchronous read operations.

        Think about it!