in reply to Re^2: [OT] A measure of 'sortedness'?
in thread [OT] A measure of 'sortedness'?

Whilst the data originates on disk and is bigger than memory; the two buffers being merged here are both fully in memory, but combined are close to the limits of memory, hence not enough space to perform the n-way merge.

If the overall pair of data sets are bigger than memory, thus requiring doing it in chunks that fit in memory, why not smaller chunks?

Replies are listed 'Best First'.
Re^4: [OT] A measure of 'sortedness'?
by BrowserUk (Patriarch) on Mar 19, 2015 at 18:27 UTC
    why not smaller chunks?

    Because eventually, you need to merge the smaller chunks into bigger ones. That includes the ones bigger than memory, but by spltting into 1/2 memory sized chunks, you can merge them in pairs:

    A B C D Each 2GB A&B B&C C&D The largest have migrated from A to D, no need to revis +it B&C A&B The smallest have migrated from D to A no need to revis +it B&C And final pass ensures everything is in place.

    Using smaller buffers only delays the inevitable and increases the number of passes.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
    In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked

      I understand that, but I still wonder if the performance boost afforded by simplifying the "shuffling around" could more than offset the extra overhead of more merges.

        No.

        If you sort A & B & C & D, then merge A+B & C+D; you still need to merge AB + CD. Better to have sorted AB, and CD, and do one merge. Ie. 4 sorts and 3 merges versus 2 sorts and 1 merge.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
        In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked