in reply to [OT] A measure of 'sortedness'?
Q: How to sort lots of data.
A: Using mergesort.
Let's say we have a dataset amounting to D, and M is the size that can be comfortably sorted in-memory. Now the first step suggests itself: divide data up in n = D / M chunks, sort each one separately. This much is already on the blueprints, I'd imagine.
The second step is to merge the n stripes. Merge pass is a streaming operation, it does not require any storage per se. Though for efficient handling, fill buffers are practical.
N-way merge may be accomplished via network of two-way merges. E.g. 4-way merge might be defined as follows:
Those merge junctions assemble like drainage pipes. An n-way merge requires n-1 junctions, and 2n-1 fill buffers for I/O. The second pass will read and write all of the data records for the second time.merge2(merge2(,), merge2(,))
As for the implementation: AVX2 intrinsics should prove handy with vectors of 64bit integers. Fill buffers in megabytes might be needed with mechanical storage.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: [OT] A measure of 'sortedness'?
by BrowserUk (Patriarch) on Mar 20, 2015 at 17:44 UTC | |
by Anonymous Monk on Mar 20, 2015 at 18:14 UTC | |
by BrowserUk (Patriarch) on Mar 20, 2015 at 19:32 UTC |