...and can you assess its big-O efficiency?
The algorithm merges, in-place, $G sorted sub-partitions of an array into a fully sorted array.
Graphically this shows the algorithm merging 3, 4-element sub-partitions (warning: best viewed with a wide screen):
There is a brief description of the algorithm in words, in the comments below:
#! perl -slw use strict; use Data::Dump qw[ pp ]; $Data::Dump::WIDTH = 1000; our $N //= 400; our $G //= 4; my $GSize = int( $N / $G ); our $S //= 1; srand( $S ); my @p = map $_ * $GSize, 0 .. $G-1; #pp \@p; my @d = map int( rand 1000 ), 1 .. $N; sub show{ print pp [ @d[ $_ .. $_ + $GSize-1 ] ] for @p; print''; } splice @d, $_, $GSize, sort{ $a <=> $b } @d[ $_ .. $_ + $GSize-1 ] for + @p; show; ## The algorithm starts here; ## with @d containing $G sorted subpartitions. ## and @p the start indexes of those sub-partitions ## and $GSize containing the length of the sub-partitions. ## Note:If the command line args partition the array into $G+1 partiti +ons; ## eg. -N=100 -G=7 given partitions [0..13][14..27][28..41][42..55][56 +..69][70..83][84..98][99] ## the last 'extra' partition is ignored. (For now.) ## merge each partition after the first for my $i ( 1 .. $#p ) { ## against all the following partitions starting with the first for( my $pA = 0; $pA < $p[ $i ]; ++$pA ) { ## $pA indexes to the next element to be merged ## if it is less the top element of the partition being merged if( $d[ $pA ] > $d[ $p[ $i ] ] ) { my $temp = $d[ $pA ]; ## Keep a copy of it $d[ $pA ] = $d[ $p[ $i ] ]; ## swap the lower element +into place my $pB; ## external declaration al +lows $pB to remember where it finished when the next loop terminates ## Insertion sort the element being swapped out ($temp) in +to the right place in the partition being merged with for( $pB = $p[ $i ]+1; $pB < ( $p[ $i ]+$GSize ) && $d[ $p +B ] < $temp; ++$pB ) { $d[ $pB-1 ] = $d[ $pB ]; }; ## Put the swapped out element into the gap we opened up $d[ --$pB ] = $temp; } # show; <STDIN>; } #show; } show;
Notes:
Searches for "in-place merges of sorted sub-partitions" turn up lots of people asking, but few good answers and lots of "just sort them together" -- completely impractical for my larger-than memory datasets.
Ive tried to relate what I'm doing to the formal literature (eg. http://comjnl.oxfordjournals.org/content/38/8/681.full.pdf, which seems to be very pertinent), but despite the time I've spent reading this sort of paper, I still find passages like:
Block Exchanges
A block exchange of two consecutive blocks U and V of sizes l1 and l2, respectively, that occupy segment L[c..c + l1 + l2 - 1 ] result in the movement of block U which occupies segment L[c..c + l1 - 1 ] to L[c + l2..c + l1 + l2 - 1 ], and in the movement of V initially in segment L[c + l1..c + l1 + l2 - 1 ] to segment L[c..c + l2 - 1 ]
as a description of: [...uuuuVVVVVVVVV...] becomes [...VVVVVVVVVuuuu...], entirely opaque for the first half dozen readings. Even the included figure doesn't help much. (Why oh why are they written this way?) Even the choice of U & V seems almost deliberate to make things hard to read.
Improvements to this algorithm (as opposed to the Perl implementation). Feels like there ought to be a way to avoid starting pA at the beginning for each extra partition.
Better algorithms. I've looked, but maybe someone knows of one.
Better approaches to the overall problem.
Any assessment of the complexity. I've tried 3 times and gotton 3 different answers.
In reply to Can you improve upon my algorithm. by BrowserUk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |