in reply to Re^7: Can you improve upon my algorithm.
in thread Can you improve upon my algorithm.

Be patient!

I didn't say when :)

FWIW: After killing the process I left running; I made one change and re-ran it:

my $nextRec = substr( $bufs[ $r ], -1024, 1024, '' ); // 1/64th itera +tions/buffer.

And it completed in under 10 minutes:

E:\junk>c:t-salvaTheory.pl 0Took 492 seconds.

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked

Replies are listed 'Best First'.
Re^9: Can you improve upon my algorithm.
by salva (Canon) on Mar 13, 2015 at 23:28 UTC
    Here we go!

    n-way-merge.c

    On my old home computer it merges at around 3GB/min, 100GB in half an hour.

    As it is just a prove of concept, it can only handle records formed by integers of type int64_t.

    The meaning of the constants are as follows:

    N = maximum number of files that can be sorted CHUNKSIZE = the per-file memory buffer KEYSIZE = the number of int64_t numbers per record
    The source files are passed as arguments on the command line, the sorted output is sent to STDOUT. Only tested on Ubuntu 64bits.

    Update: It is difficult to count the number of instructions inside the loop on the hot path. It is certainly below 100, don't know if I have been able to reach the 50 I have predicted... I blame gcc -O3 for that for unrolling small loops as the key comparison.

      Case and analysis proven.

      Setting KEYSIZE = 1; compiling /Ox and running it against my 50x1GB files, it sustains a pretty steady 60MB in/60MB out per second, at a cpu utilisation of ~22-23%; which given my wc -c baseline test of 171mb/s is unidirectional, probably means that it is IO-bound rather than cpu-bound -- testimony to a well implemented heap.

      Total runtime for 50GB was a gnat's over 14 minutes which comes out at 61MB/s each way. Very impressive.

      Just makes you wonder about gnu/ms sort!

      I'll try and add the sort to split files into it tomorrow.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
      In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
        Case and analysis proven

        Actually, there was an error on the analysis. I predicted the in-memory merge to be RAM bound because of the log2N comparisons per record all going to RAM to fetch the keys. That is wrong, the records at the top of the buffers stay on the L1 cache. Only the replacement for the one at the top of the heap has to be loaded from RAM. So, it is actually CPU bound.

        I'll try and add the sort to split files into it tomorrow.

        I used my own Sort::Packed for that.