princepawn has asked for the wisdom of the Perl Monks concerning the following question:

Most people seem to choose a read size of 256 or 1024, but I never really knew why. As part of my effort to implement seek for FTP in a relatively transparent way I figure I might just do a huge 4 megabyte read from my ftp server before passing back the filehandle to emulate a seek command, since you cannot seek on a filehandle returned from Net::FTP. But if necessary, I will do several small reads and throw-away the results until I get to the desired offset position.
  • Comment on how to choose read size for read() and sysread()

Replies are listed 'Best First'.
(tye)Re: how to choose read size for read() and sysread()
by tye (Sage) on May 22, 2001 at 09:59 UTC

    Most hard disks won't allow you to read less than 512 bytes at a time. Most file systems won't allow you to read less than around 4K or 8K bytes at a time. The operating system (and/or C RTL) hide these facts from you, but that just means that when you ask for 1 byte of a file, 8K of the file is read by the operating system and 1 byte of that 8K is given to you.

    So reading a power of two multiple of 4K bytes at a time is usually a good thing... to a point...

    Reading 4MB is probably a pretty dumb idea, especially if you just plan to throw all (or even most) of that data away. Allocating a 4MB buffer just isn't a trivial thing on most systems. Then as the buffer is filled, pages of memory have to be allocated to slowly cover the huge chunk of virtual address space you managed to reserve. It is much more efficient, in most cases, to just reuse a few dozen pages of RAM for reading reasonably sized chunks of data that you plan to throw away.

            - tye (but my friends call me "Tye")
Re: how to choose read size for read() and sysread()
by lemming (Priest) on May 22, 2001 at 04:19 UTC
    Th is something that I haven't done much under Perl, but have done more under C. I think the theory may still be valid in that you want to grab enough of a chunk to make the number of disk accesses lower. If you grab too much, you may start swapping. Also grabbing powers of 2 were more kind due to crossing chunk boundries. (ie grabbing 1050 was about as fast as 2000 in some cases.) But that depends on disk layouts as well.