in reply to Re^2: [OT] A measure of 'sortedness'?
in thread [OT] A measure of 'sortedness'?

If you sort the chunks while generating, then apply the n-way merge as described—that full procedure results in a 100GB write, plus another 100GB of reads and 100 GB of writes. In total, 300GB of streaming (external memory access).

You are searching for an algorithm that does better, or claim to have found one?

Replies are listed 'Best First'.
Re^4: [OT] A measure of 'sortedness'?
by BrowserUk (Patriarch) on Mar 20, 2015 at 19:32 UTC
    If you sort the chunks while generating,

    That presupposes that I'm generating the files.

    You are searching for an algorithm that does better

    I'm writing a utility to sort (much) larger-than-memory files of fixed-length records.

    I already have a working version -- actually several, each an improvement on the previous -- but they are pretty slow. Infinitely faster than my system sort utility, but still significant room for improvement.

    or claim to have found one?

    It is not hard to beat your local system sort utility for this purpose.

    To understand why, you really need to do as I suggested, generate a large, fixed-length record binary file, and try them for yourself.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
    In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked