Reading files too slow

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Reading files too slow by cdarke (Prior) on Jan 14, 2009 at 11:16 UTC
Probably buffering and/or cache. On most systems there is on optimum chunk to be read from a disk, usually measured in kb rather than bytes. So, reading 4 bytes will probably read a lot more and that extra data will be held in a cache. With a small file it is quite possible that on the first read the entire file is held in cache or a buffer in memeory, so the second read is very fast.	[reply]
Re: Reading files too slow by BrowserUk (Patriarch) on Jan 14, 2009 at 11:59 UTC
If you'rr re-reading files during a single run, and the files are smallish (say <= 50k average ), then it might makes sense to slurp (read the entire file as a single string) them into an array or hash. 10k * 50k == 500MB, which is well with the reach of most machines these days. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply]
Re: Reading files too slow by tilly (Archbishop) on Jan 14, 2009 at 12:07 UTC
The first read probably does 2 disk seeks (one to find out where the file is, one to red the file). The time for a disk seek is set by the rotation speed of your hard drive. When it does the first read it caches more of the file "just in case". So when you read the rest you're reading from cache, which is fast.	[reply]
Re^2: Reading files too slow by matze77 (Friar) on Jan 14, 2009 at 14:34 UTC
Beside the programming issues: Modern OS and Hardware tries to guess what "might come next" so i also think the 2nd read is from cache also keep in mind that depending on the "clustersize" (i hope i got the right word cmiiw) of your filesystem there might also be already more read than just the very small portion of the file you might access (512 Byte is the minimum which is adressed by harddisk), maybe tuning your filesystem could also help. Last but not least faster hardware could be an option: software raid is, surprisingly performance related not far behind (on modern systems) hardware-raid if you could spend the money, i/o is in most cases the bottleneck. I also wonder if one could hold some of the files (maybe the often accessed) in a ram-disk, but i got no experience if this is possible ... hth MH	[reply]
Re: Reading files too slow by Fletch (Bishop) on Jan 14, 2009 at 13:39 UTC
And you don't mention the OS or how your files are laid out, but be aware that on many NIX variants some filesystems have problems with large (FSVO large that varies by filesystem) numbers of files in a single directory. For (say) an ext2 fs 10,000 would count and you'd notice a good bit of latency for any operation on the containing directory. A common workaround is to break the files up into subdirectories keyed by part of the filename or something derived from the filename (e.g.* if the filenames were 5 digit numbers have 10 top level directories 0..9 each of which has its own subdirectories 0..9 and the file `12345.foo` would be located in `1/2/12345.dat`). The cake is a lie. The cake is a lie. The cake is a lie.	[reply]