in reply to Re: [OT] A measure of 'sortedness'?
in thread [OT] A measure of 'sortedness'?
Depending on the actual size of your buffer in comparison to, say, the number of indexes you have (I don't know what you mean by 10%, exactly)
As an indication: the two buffers might be 2GB each.
The 10% was to indicate that I don't have enough physical memory to allocate another 4GB in which to do a normal n-way merge.
When merging in place, a typical requirement is that there are two segments, 1 in each of the buffers, that need to be interchanged:
ab cd ef gh [..............xxxxxxx....................][......yyyyyyyyyyy......... +...............]
Above, the two buffers are in 'mid merge'. All the valued f..g are > a and < d; and all the values b..c are > e and < h.
Typically, this would be done by exchanging all the xs with ys:
af d eb c gh [..............yyyyyyy....................][......xxxxxxxyyyy......... +...............]
Then rotating the whole section d..c forward 1 at a time:
af d eb c gh [..............yyyyyyyy....................][......xxxxxxxyyy......... +...............] af d eb c gh [..............yyyyyyyyy...................][.......xxxxxxxyy......... +...............] af d eb cgh [..............yyyyyyyyyy..................][........xxxxxxxy......... +...............] af gd eb ch [..............yyyyyyyyyyy.................][.........xxxxxxx......... +...............]
Which means all the records between original position of d, and the final position of c, have to move forward 1, 4 times.
Which doesn't look so bad when the difference is just 4 and length is 20 or so, but if the difference is a few hundred or more, and the length (say) 1/2GB, it is very costly. n00 x 1/2GB.
If instead, you can copy the 4 (or few hundred) to an out-of-line buffer, move the 20+ (or 1/2GB) forward by the length of that buffer just once, then put the ool buffer back, it saves lots of time.
And if the out-of-line buffer (say 2MB) is smaller than the residual for the move (which will be a very rare event), then you do the move in two (or more) chunks. You've still divided the total number of record swaps by 2 million.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: [OT] A measure of 'sortedness'?
by marinersk (Priest) on Mar 18, 2015 at 23:27 UTC | |
by BrowserUk (Patriarch) on Mar 18, 2015 at 23:53 UTC | |
by marinersk (Priest) on Mar 19, 2015 at 00:31 UTC | |
by BrowserUk (Patriarch) on Mar 19, 2015 at 03:31 UTC | |
by marinersk (Priest) on Mar 19, 2015 at 15:18 UTC | |
by marinersk (Priest) on Mar 19, 2015 at 20:33 UTC | |
| |
|
Re^3: [OT] A measure of 'sortedness'?
by sauoq (Abbot) on Mar 19, 2015 at 15:51 UTC | |
by BrowserUk (Patriarch) on Mar 19, 2015 at 21:18 UTC | |
by marinersk (Priest) on Mar 19, 2015 at 19:04 UTC |