I am looking for a way to increase the 4k size that strace shows read() will do at system level. Yes, I've read Re^3: Perl Read-Ahead I/O Buffering and I kindly disagree that 4k is enough for everybody. After all, in C you can use setvbuf to set your buffer sizes, and in C++ you can use a complex streambuf statement to increase the buffer size. Occasionally you have situations where a larger buffer makes sense.
Motivation:
While Stevens's APUE, chapter 3, does show no significant improvements for buffers over 4k, this data is likely 20 years old. My current disk space approaches 0.1 PB. The file system uses 64k size blocks. Worse, the persistent storage is delivered via NFS (albeit inodes and data are served from different physical machines, and multiple such pairs for various mount points). Occasionally, I need to read large files that exceed the local compute node disk space, so I am forced to read them from NFS.
Now, each read() that shows up in strace will incur 1 NFS request. If I have a 8GB file that I read in 4k chunks in 200 parallel processes, my computations will issue 400 million NFS requests. As you can imagine, the few other users with whom I share the fs are angry with me for slowing the servers to a crawl (ever waited 1 min for a "cd some/where"), and the admins are decidedly unhappy, too. The admins actually suggested to read the file in 8M chunks, which would issue only ~200000 NFS requests. Of course, my solution was to copy that file to the local disk for the compute node, and then compute from there. But sometimes I have files that are larger than the local disk scratch space, and I am thus forced to read directly from NFS.
Since PerlIO's setvbuf has been disabled, I wonder, how do I set a larger read buffer size in Perl, so that the read()s as seen by strace are using more than 4k? Even if it is not making my Perl programs run faster, it would make the NFS server experience less load, and thus frustrate the admins less, who will have to deal with annoyed users.
I've trolled the web for some time, and couldn't really find an applicable solution how to increase Perl's read buffers sizes. I've written a FullyBuffered module using sysreads within an object, by-passing regular PerlIO, but it feels slow, and does not integrate nicely with PerlIO handles, e.g. occasionally, I do need the utf8-layer. I'd be loathe having to recompile my Perl to make the default read buffer size larger, though I would be willing to do, with good instructions, if that is what it takes.
I'd really appreciate some insight into increasing the read buffer size.
Thank you,
Jens.
In reply to 4k read buffer is too small by voeckler
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |