Re^5: read and sort multiple files

Replies are listed 'Best First'.
Re^6: read and sort multiple files by matrixmadhan (Beadle) on Dec 05, 2008 at 18:47 UTC
Yes, that's right. Sorting on the split chunks isn't free as well as there is additional 'n-1' open and close involved with the extra 'n-1' files when working with split chunks of a file	[reply]
Re^7: read and sort multiple files by ikegami (Patriarch) on Dec 06, 2008 at 08:04 UTC
What extra open and closes? Grouping files greatly cuts down on the number of opens and closes. In the case of a few small files, say 10 files, 100 of which fit in memory at a time. Without grouping: To sort: 210 opens To merge: 3(10-1) opens Total: 47 With grouping: To sort: 10 + ceil(10/100) opens To merge: 3(ceil(10/100)-1) opens Total: 11 In the case of lots of small files, say 100 files, 10 of which fit in memory at a time. Without grouping: To sort: 2100 opens To merge: 3(100-1) opens Total: 497 With grouping: To sort: 100 + ceil(100/10) opens To merge: 3(ceil(100/10)-1) opens Total: 137 In the case of big files, it simply won't work without splitting. You could cut down on the number of opens and closes by merging more than two files at a time, but it's not relevant here because it would apply to both grouping and non-grouping algorithms.	[reply] [d/l]
Re^8: read and sort multiple files by matrixmadhan (Beadle) on Dec 06, 2008 at 14:52 UTC
I think I haven't understood what you are trying to say. This is what I thought, 10 files - 100 of which can fit in memory which means that all the 10 can be loaded into memory at once. `10 open operations for 10 files load data in to memory from all the 10 files 10 close operations for 10 files sort everything in memory itself 1 - open for output file flush sorted data from memory to output file 1 - close for output file` [download] For the above example, its 2 * ( n + 1 ) close and open operations So, in this case this doesn't tally with what you have said. Apologies again, if I haven't understood what you meant.	[reply] [d/l]
Re^9: read and sort multiple files by ikegami (Patriarch) on Dec 06, 2008 at 15:19 UTC