BrowserUk has asked for the wisdom of the Perl Monks concerning the following question:
...and can you assess its big-O efficiency?
The algorithm merges, in-place, $G sorted sub-partitions of an array into a fully sorted array.
Graphically this shows the algorithm merging 3, 4-element sub-partitions (warning: best viewed with a wide screen):
C:\test>inplaceMerge -N=12 -G=3 Starting point: pA=0 p[i]=4 [231, 584, 675, 787] [165, 177, 714, 838] [451, 456, 714, 887] tem +p = d[pA]:231, d[pA] = d[p[i]]:165, d[pB-1] = d[pB]:177, d[--pB] = te +mp:231 pA=1 p[i]=4 [165, 584, 675, 787] [177, 231, 714, 838] [451, 456, 714, 887] tem +p = d[pA]:584, d[pA] = d[p[i]]:177, d[pB-1] = d[pB]:231, d[--pB] = te +mp:584 pA=2 p[i]=4 [165, 177, 675, 787] [231, 584, 714, 838] [451, 456, 714, 887] tem +p = d[pA]:675, d[pA] = d[p[i]]:231, d[pB-1] = d[pB]:584, d[--pB] = te +mp:675 pA=3 p[i]=4 [165, 177, 231, 787] [584, 675, 714, 838] [451, 456, 714, 887] tem +p = d[pA]:787, d[pA] = d[p[i]]:584, d[pB-1] = d[pB]:675, d[pB-1] = d +[pB]:714, d[--pB] = temp:787 pA=0 p[i]=8 [165, 177, 231, 584] [675, 714, 787, 838] [451, 456, 714, 887] pA=1 p[i]=8 [165, 177, 231, 584] [675, 714, 787, 838] [451, 456, 714, 887] pA=2 p[i]=8 [165, 177, 231, 584] [675, 714, 787, 838] [451, 456, 714, 887] pA=3 p[i]=8 [165, 177, 231, 584] [675, 714, 787, 838] [451, 456, 714, 887] tem +p = d[pA]:548, d[pA] = d[p[i]]:451, d[pB-1] = d[pB]:456, d[--pB] = te +mp:458 pA=4 p[i]=8 [165, 177, 231, 451] [675, 714, 787, 838] [456, 584, 714, 887] tem +p = d[pA]:675, d[pA] = d[p[i]]:456, d[pB-1] = d[pB]:584, d[--pB] = te +mp:675 pA=5 p[i]=8 [165, 177, 231, 451] [456, 714, 787, 838] [584, 675, 714, 887] tem +p = d[pA]:714, d[pA] = d[p[i]]:584, d[pB-1] = d[pB]:675, d[--pB] = te +mp:714 pA=6 p[i]=8 [165, 177, 231, 451] [456, 584, 787, 838] [675, 714, 714, 887] tem +p = d[pA]:787, d[pA] = d[p[i]]:675, d[pB-1] = d[pB]:714, d[pB-1] = d[ +pB]:714, d[--pB] = temp:787 pA=7 p[i]=8 [165, 177, 231, 451] [456, 584, 675, 838] [714, 714, 787, 887] tem +p = d[pA]:838, d[pA] = d[p[i]]:714, d[pB-1] = d[pB]:714, d[pB-1] = d[ +pB]:787, d[--pB] = temp:838
There is a brief description of the algorithm in words, in the comments below:
#! perl -slw use strict; use Data::Dump qw[ pp ]; $Data::Dump::WIDTH = 1000; our $N //= 400; our $G //= 4; my $GSize = int( $N / $G ); our $S //= 1; srand( $S ); my @p = map $_ * $GSize, 0 .. $G-1; #pp \@p; my @d = map int( rand 1000 ), 1 .. $N; sub show{ print pp [ @d[ $_ .. $_ + $GSize-1 ] ] for @p; print''; } splice @d, $_, $GSize, sort{ $a <=> $b } @d[ $_ .. $_ + $GSize-1 ] for + @p; show; ## The algorithm starts here; ## with @d containing $G sorted subpartitions. ## and @p the start indexes of those sub-partitions ## and $GSize containing the length of the sub-partitions. ## Note:If the command line args partition the array into $G+1 partiti +ons; ## eg. -N=100 -G=7 given partitions [0..13][14..27][28..41][42..55][56 +..69][70..83][84..98][99] ## the last 'extra' partition is ignored. (For now.) ## merge each partition after the first for my $i ( 1 .. $#p ) { ## against all the following partitions starting with the first for( my $pA = 0; $pA < $p[ $i ]; ++$pA ) { ## $pA indexes to the next element to be merged ## if it is less the top element of the partition being merged if( $d[ $pA ] > $d[ $p[ $i ] ] ) { my $temp = $d[ $pA ]; ## Keep a copy of it $d[ $pA ] = $d[ $p[ $i ] ]; ## swap the lower element +into place my $pB; ## external declaration al +lows $pB to remember where it finished when the next loop terminates ## Insertion sort the element being swapped out ($temp) in +to the right place in the partition being merged with for( $pB = $p[ $i ]+1; $pB < ( $p[ $i ]+$GSize ) && $d[ $p +B ] < $temp; ++$pB ) { $d[ $pB-1 ] = $d[ $pB ]; }; ## Put the swapped out element into the gap we opened up $d[ --$pB ] = $temp; } # show; <STDIN>; } #show; } show;
Notes:
Searches for "in-place merges of sorted sub-partitions" turn up lots of people asking, but few good answers and lots of "just sort them together" -- completely impractical for my larger-than memory datasets.
Ive tried to relate what I'm doing to the formal literature (eg. http://comjnl.oxfordjournals.org/content/38/8/681.full.pdf, which seems to be very pertinent), but despite the time I've spent reading this sort of paper, I still find passages like:
Block Exchanges
A block exchange of two consecutive blocks U and V of sizes l1 and l2, respectively, that occupy segment L[c..c + l1 + l2 - 1 ] result in the movement of block U which occupies segment L[c..c + l1 - 1 ] to L[c + l2..c + l1 + l2 - 1 ], and in the movement of V initially in segment L[c + l1..c + l1 + l2 - 1 ] to segment L[c..c + l2 - 1 ]
as a description of: [...uuuuVVVVVVVVV...] becomes [...VVVVVVVVVuuuu...], entirely opaque for the first half dozen readings. Even the included figure doesn't help much. (Why oh why are they written this way?) Even the choice of U & V seems almost deliberate to make things hard to read.
Improvements to this algorithm (as opposed to the Perl implementation). Feels like there ought to be a way to avoid starting pA at the beginning for each extra partition.
Better algorithms. I've looked, but maybe someone knows of one.
Better approaches to the overall problem.
Any assessment of the complexity. I've tried 3 times and gotton 3 different answers.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Can you improve upon my algorithm.
by bitingduck (Deacon) on Mar 12, 2015 at 05:29 UTC | |
by BrowserUk (Patriarch) on Mar 12, 2015 at 14:48 UTC | |
|
Re: Can you improve upon my algorithm.
by bitingduck (Deacon) on Mar 12, 2015 at 04:16 UTC | |
|
Re: Can you improve upon my algorithm.
by atcroft (Abbot) on Mar 12, 2015 at 04:43 UTC | |
by BrowserUk (Patriarch) on Mar 12, 2015 at 14:08 UTC | |
by bitingduck (Deacon) on Mar 12, 2015 at 15:01 UTC | |
|
Re: Can you improve upon my algorithm.
by salva (Canon) on Mar 12, 2015 at 10:11 UTC | |
by BrowserUk (Patriarch) on Mar 12, 2015 at 13:09 UTC | |
by salva (Canon) on Mar 12, 2015 at 22:04 UTC | |
by BrowserUk (Patriarch) on Mar 13, 2015 at 04:40 UTC | |
by salva (Canon) on Mar 13, 2015 at 10:06 UTC | |
| |
by bitingduck (Deacon) on Mar 12, 2015 at 15:22 UTC |