Re^3: Processing large file using threads

It sounds like the major bottleneck in the process is going to be reading data from the disk. So the best way to speed this up would be to break the file down into a few chunks, and to run this job on separate machines.

You can just merge the data it produces at the end of the process.

Comment on Re^3: Processing large file using threads

Replies are listed 'Best First'.
Re^4: Processing large file using threads by zentara (Cardinal) on May 08, 2007 at 16:44 UTC
I agree with this. There has been a few posts here in the past, where it was shown that the OS optimizes the processing of files, and it dosn't do much good to split the file and process the chunks in different threads of the same process. Disk IO will be the bottleneck. Different machines is a good idea. Or maybe if you were on a fast scsi system, and could put the different chunks on different scsi disks it would do some good. I'm not really a human, but I play one on earth. Cogito ergo sum a bum	[reply]

Replies are listed 'Best First'.

Re^4: Processing large file using threads
by zentara (Cardinal) on May 08, 2007 at 16:44 UTC

I'm not really a human, but I play one on earth. Cogito ergo sum a bum

[reply]