in reply to Re^4: read and sort multiple files
in thread read and sort multiple files

This cannot be guaranteed, what if the file is too big

Can't be. It sorts the split chunks, not the original files.

Replies are listed 'Best First'.
Re^6: read and sort multiple files
by matrixmadhan (Beadle) on Dec 05, 2008 at 18:47 UTC
    Yes, that's right.

    Sorting on the split chunks isn't free as well as there is additional 'n-1' open and close involved with the extra 'n-1' files when working with split chunks of a file

      What extra open and closes? Grouping files greatly cuts down on the number of opens and closes.

      • In the case of a few small files, say 10 files, 100 of which fit in memory at a time.

        • Without grouping:
          To sort: 2*10 opens
          To merge: 3*(10-1) opens
          Total: 47

        • With grouping:
          To sort: 10 + ceil(10/100) opens
          To merge: 3*(ceil(10/100)-1) opens
          Total: 11

      • In the case of lots of small files, say 100 files, 10 of which fit in memory at a time.

        • Without grouping:
          To sort: 2*100 opens
          To merge: 3*(100-1) opens
          Total: 497

        • With grouping:
          To sort: 100 + ceil(100/10) opens
          To merge: 3*(ceil(100/10)-1) opens
          Total: 137

      • In the case of big files, it simply won't work without splitting.

      You could cut down on the number of opens and closes by merging more than two files at a time, but it's not relevant here because it would apply to both grouping and non-grouping algorithms.

        I think I haven't understood what you are trying to say.

        This is what I thought, 10 files - 100 of which can fit in memory which means that all the 10 can be loaded into memory at once.
        10 open operations for 10 files load data in to memory from all the 10 files 10 close operations for 10 files sort everything in memory itself 1 - open for output file flush sorted data from memory to output file 1 - close for output file
        For the above example, its 2 * ( n + 1 ) close and open operations

        So, in this case this doesn't tally with what you have said.

        Apologies again, if I haven't understood what you meant.