Re: Configurable IO buffersize?
by ikegami (Patriarch) on Jul 31, 2011 at 10:20 UTC
|
sysread doesn't use the buffer, so it's not limited by it. You could surely use tie to make a read use sysread.
Since 5.14, it's 8k and configurable when Perl is built.
The previous default size of a PerlIO buffer (4096 bytes) has been increased to the larger of 8192 bytes and your local BUFSIZ. Benchmarks show that doubling this decade-old default increases read and write performance by around 25% to 50% when using the default layers of perlio on top of unix. To choose a non-default size, such as to get back the old value or to obtain an even larger value, configure with:
./Configure -Accflags=-DPERLIOBUF_DEFAULT_BUFSIZ=N
where N is the desired size in bytes; it should probably be a multiple of your page size.
| [reply] [d/l] |
|
|
Since 5.14, it's 8k and configurable when Perl is built.
For my current project I need to read from up to 100 files concurrently.
I've demonstrated that on Windows, when reading a single file, using 64k reads works out to be most efficient. I've also proved to myself that when processing input from multiple files concurrently (interleaved), that using even bigger read sizes reduces the number of seeks between file positions and can give substantial gains.
Compile-time configuration doesn't really cut it. Would you use a module that required you to re-build Perl?
You could surely use tie to make a read use sysread.
Indeed, I've been hand-coding sliding buffers with adaptions to specific usages for years, but I thought I saw mention of a module that would allow all the usual line-oriented usage of filehandles, whilst sysread/syswriteing configurable sized chunks from/to disk.
I can write one, but writing a fully-fledged, all-singing/dancing generic module takes a lot of time and thought. I'm surprised it doesn't already exist.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] |
|
|
| [reply] |
|
|
|
|
Would you use a module that required you to re-build Perl?
You don't need a module that requires you to re-build Perl, you want a tweak Perl so it works better for your special needs, and it's actually very common to tweak productions systems by rebuilding components instead of using out of the box settings.
Note: I'm not defending the lack of ability to set this more conveniently.
| [reply] |
|
|
|
|
I don't know if this approach is applicable to your situation or not, but it sounds like performance is important enough that a lot of hassle might be ok. If true, then I would try adjusting things such that the file system will always read a minimum of 64KB no matter what.
The way to do this is by adjusting what Microsoft calls the cluster size, what other vocabularies call the extent size. This the smallest unit of storage that NTFS will read/write on the disk and it will be contiguous. Doing this requires that you make a special logical drive and format it using the /A: option to the format command:
FORMAT <drive>: /FS:NTFS /A:<clustersize>
clustersize = 65536, that is the maximum size
So this drive is used like any other, except that every file it on it will take a minimum of 64K of space on the disk (even for a 1 byte file).
I have not benchmarked this on Windows NTFS, but I have on other OS/ file systems. I predict significant performance gains.
| [reply] |
|
|
|
|
|
Re: Configurable IO buffersize?
by jwkrahn (Abbot) on Jul 31, 2011 at 00:31 UTC
|
| [reply] |
|
|
...but - as it says in the synopsis of the module - "setvbuf is not available by default on Perls 5.8.0 and later". In other words, setvbuf doesn't work with perls that have been compiled with PerlIO.
See also 4k read buffer is too small.
| [reply] [d/l] |
|
|
Thanks. It's a shame it no longer works.
(I have to wonder about the real benefits of PerlIO. It seems to be an insanely complex, and yet woefully incomplete, re-implementation of stdio, that bypasses both vendor and platform specific optimisations, for not much gain.)
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] |
Re: Configurable IO buffersize?
by Anonymous Monk on Aug 02, 2011 at 02:19 UTC
|
This *ought* to work , untested, but I just copy/pasted the relevant bits from PerlIOBuf_get_base :) Edit perl-5.15.1/dist/IO/IO.xs
void
setbuf(handle, ...)
OutputStream handle
CODE:
if (handle)
#ifdef PERLIO_IS_STDIO
{
char *buf = items == 2 && SvPOK(ST(1)) ?
sv_grow(ST(1), BUFSIZ) : 0;
setbuf(handle, buf);
}
#else
{ /* not_here("IO::Handle::setbuf"); */
PerlIOBuf * const b = PerlIOSelf(f, PerlIOBuf);
PERL_UNUSED_CONTEXT;
if(items == 2 && SvPOK(ST(1)) )
{
b->bufsiz = SvLEN(ST(1));
Newxz(b->buf, b->bufsiz , STDCHAR);
}
}
#endif
The transformation to setvbuf would be similar, but, i'll let you work out the logic :)
| [reply] [d/l] |
|
|
Thank you kind Sir.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] |
|
|
Ok, now I've tested it, it works, the concept is proved, but there are some issues, like using :raw causes PerlIO to croak -- probably something about XS I'm doing wrong :)
I used ->getbufsiz to test if ->setbuf had effect, because Devel::Peek/Data::Peek was no help with handles, and trying to parse perliol.h with Convert::Binary::C wasn't happening
This definitely needs official support
Cheers
| [reply] [d/l] |
|
|
|
|
|
|
|
|
Bah, typo (f not handle), still untested :)
void
setbuf(handle, ...)
OutputStream handle
CODE:
if (handle)
#ifdef PERLIO_IS_STDIO
{
char *buf = items == 2 && SvPOK(ST(1)) ?
sv_grow(ST(1), BUFSIZ) : 0;
setbuf(handle, buf);
}
#else
{ /* not_here("IO::Handle::setbuf"); */
PerlIOBuf * const b = PerlIOSelf( handle, PerlIOBuf);
PERL_UNUSED_CONTEXT;
if(items == 2 && SvPOK(ST(1)) )
{
b->bufsiz = SvLEN(ST(1));
Newxz(b->buf, b->bufsiz , STDCHAR);
}
}
#endif
| [reply] [d/l] |