Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi everybody! I have about 10_000 files and my program reads each of them from time to time. All files have the following structure: header length, header and body. It is strange but, reading length of the header takes about 0.012 sec (I uses simple code for it: read(HANDLE,$length,4)), reading header & body about 0.005. Readin 4 bytes takes more time than whole file. Why?

Replies are listed 'Best First'.
Re: Reading files too slow
by cdarke (Prior) on Jan 14, 2009 at 11:16 UTC
    Probably buffering and/or cache. On most systems there is on optimum chunk to be read from a disk, usually measured in kb rather than bytes. So, reading 4 bytes will probably read a lot more and that extra data will be held in a cache. With a small file it is quite possible that on the first read the entire file is held in cache or a buffer in memeory, so the second read is very fast.
Re: Reading files too slow
by BrowserUk (Patriarch) on Jan 14, 2009 at 11:59 UTC

    If you'rr re-reading files during a single run, and the files are smallish (say <= 50k average ), then it might makes sense to slurp (read the entire file as a single string) them into an array or hash. 10k * 50k == 500MB, which is well with the reach of most machines these days.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Reading files too slow
by tilly (Archbishop) on Jan 14, 2009 at 12:07 UTC
    The first read probably does 2 disk seeks (one to find out where the file is, one to red the file). The time for a disk seek is set by the rotation speed of your hard drive.

    When it does the first read it caches more of the file "just in case". So when you read the rest you're reading from cache, which is fast.

      Beside the programming issues: Modern OS and Hardware tries to guess what "might come next" so i also think the 2nd read is from cache also keep in mind that depending on the "clustersize" (i hope i got the right word cmiiw) of your filesystem there might also be already more read than just the very small portion of the file you might access (512 Byte is the minimum which is adressed by harddisk), maybe tuning your filesystem could also help.
      Last but not least faster hardware could be an option: software raid is, surprisingly performance related not far behind (on modern systems) hardware-raid if you could spend the money, i/o is in most cases the bottleneck. I also wonder if one could hold some of the files (maybe the often accessed) in a ram-disk, but i got no experience if this is possible ...

      hth MH
Re: Reading files too slow
by Fletch (Bishop) on Jan 14, 2009 at 13:39 UTC

    And you don't mention the OS or how your files are laid out, but be aware that on many *NIX variants some filesystems have problems with large (FSVO large that varies by filesystem) numbers of files in a single directory. For (say) an ext2 fs 10,000 would count and you'd notice a good bit of latency for any operation on the containing directory.

    A common workaround is to break the files up into subdirectories keyed by part of the filename or something derived from the filename (e.g. if the filenames were 5 digit numbers have 10 top level directories 0..9 each of which has its own subdirectories 0..9 and the file 12345.foo would be located in 1/2/12345.dat).

    The cake is a lie.
    The cake is a lie.
    The cake is a lie.