in reply to Re^2: Configurable IO buffersize?
in thread Configurable IO buffersize?

I don't know if this approach is applicable to your situation or not, but it sounds like performance is important enough that a lot of hassle might be ok. If true, then I would try adjusting things such that the file system will always read a minimum of 64KB no matter what.

The way to do this is by adjusting what Microsoft calls the cluster size, what other vocabularies call the extent size. This the smallest unit of storage that NTFS will read/write on the disk and it will be contiguous. Doing this requires that you make a special logical drive and format it using the /A: option to the format command:
FORMAT <drive>: /FS:NTFS /A:<clustersize>
clustersize = 65536, that is the maximum size

So this drive is used like any other, except that every file it on it will take a minimum of 64K of space on the disk (even for a 1 byte file).

I have not benchmarked this on Windows NTFS, but I have on other OS/ file systems. I predict significant performance gains.

Replies are listed 'Best First'.
Re^4: Configurable IO buffersize?
by BrowserUk (Patriarch) on Aug 01, 2011 at 16:22 UTC
    The way to do this is by adjusting ... the cluster size,

    Whilst this approach might actually benefit my application to some extent, it would -- even more so than re-compiling perl to use bigger buffers -- be an extremely heavy handed way of achieving those gains.

    Reconfiguring the file system to benefit one application without considering the affects of that change on every thing else would be a very drastic step. For example, the OS uses 4k pages of virtual memory, and backs those virtual pages with clusters of memory mapped physical disk. (All executable files are loaded using memory mapped IO.). What would be the effect of having 4k virtual pages backed by 64k disk clusters?

    But in any case, the scenario I describe is not so unusual, nor something unique to my machine. Think of every time you use an external sort program on a huge file. These work by first reading the source file sequentially and writing partially sorted chunks to temporary files, then merging those temporary files. In the second stage they are interleaving reads from multiple source files and writing to the output file. The exact scenario I described above.

    Perl's limitation on setting the buffer size used for buffered IO on a file-by-file basis is a real and distinct limitation.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Reconfiguring the file system to benefit one application without considering the affects of that change on every thing else would be a very drastic step.
      I guess the idea wasn't clear. I certainly was not recommending reconfiguring the whole system or even a whole physical disk, just some area for processing these special files. With current technology using 64K for the system's "default" cluster size would be a very bad idea - agreed!

      Sorry it didn't work out for you. I figured it was worth mentioning because it was a no code solution.

        just some area for processing these special files.

        I'll say it again. There is nothing "special" about these files; nor the processing be performed on them. You sort all types of files for all sorts of reasons, and my application is not dissimilar.

        Would you install a (say) Sort::External module that required you to repartition your drive so you could re-format a chunk of it with bigger clusters?


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.