Q: How to sort lots of data.
A: Using mergesort.
Let's say we have a dataset amounting to D, and M is the size that can be comfortably sorted in-memory. Now the first step suggests itself: divide data up in n = D / M chunks, sort each one separately. This much is already on the blueprints, I'd imagine.
The second step is to merge the n stripes. Merge pass is a streaming operation, it does not require any storage per se. Though for efficient handling, fill buffers are practical.
N-way merge may be accomplished via network of two-way merges. E.g. 4-way merge might be defined as follows:
Those merge junctions assemble like drainage pipes. An n-way merge requires n-1 junctions, and 2n-1 fill buffers for I/O. The second pass will read and write all of the data records for the second time.merge2(merge2(,), merge2(,))
As for the implementation: AVX2 intrinsics should prove handy with vectors of 64bit integers. Fill buffers in megabytes might be needed with mechanical storage.
In reply to Re: [OT] A measure of 'sortedness'?
by Anonymous Monk
in thread [OT] A measure of 'sortedness'?
by BrowserUk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |