Re^8: Working with large amount of data

But splitting directly into 100 files may be a horrible idea for the simple reason that disk drives are typically able to stream data at high rates to a fixed number of locations. Like 4 or 16

Are you completely sure about that?

AFAIK the OS and the file system layer should mitigate any hardware limitation like that. Writes are cached and reordered before sending them to the hardware, so it shouldn't be any difference between writing to a file or to a thousand...

Well, unless you have your file system configured to commit everything immediately, but this is not common because of the huge performance penalty it imposes!

Comment on Re^8: Working with large amount of data

Replies are listed 'Best First'.
Re^9: Working with large amount of data by tilly (Archbishop) on Sep 22, 2009 at 15:13 UTC
I'm not completely sure about that. I had some bad experiences with Linux and disk drives a decade ago that have left me suspicious of how good the OS is at caching and reordering stuff. Things are certainly better now, but how much better I do not know. Put it this way. If I was solving this problem on this hardware, I'd be sure to do some trial runs on smaller sets. And one thing I'd be testing is how many pieces to split a file into in one pass. Because it could matter.	[reply]

Replies are listed 'Best First'.

Re^9: Working with large amount of data
by tilly (Archbishop) on Sep 22, 2009 at 15:13 UTC

Put it this way. If I was solving this problem on this hardware, I'd be sure to do some trial runs on smaller sets. And one thing I'd be testing is how many pieces to split a file into in one pass. Because it could matter.

[reply]