...and can you assess its big-O efficiency?

The algorithm merges, in-place, $G sorted sub-partitions of an array into a fully sorted array.

Graphically this shows the algorithm merging 3, 4-element sub-partitions (warning: best viewed with a wide screen):

There is a brief description of the algorithm in words, in the comments below:

#! perl -slw use strict; use Data::Dump qw[ pp ]; $Data::Dump::WIDTH = 1000; our $N //= 400; our $G //= 4; my $GSize = int( $N / $G ); our $S //= 1; srand( $S ); my @p = map $_ * $GSize, 0 .. $G-1; #pp \@p; my @d = map int( rand 1000 ), 1 .. $N; sub show{ print pp [ @d[ $_ .. $_ + $GSize-1 ] ] for @p; print''; } splice @d, $_, $GSize, sort{ $a <=> $b } @d[ $_ .. $_ + $GSize-1 ] for + @p; show; ## The algorithm starts here; ## with @d containing $G sorted subpartitions. ## and @p the start indexes of those sub-partitions ## and $GSize containing the length of the sub-partitions. ## Note:If the command line args partition the array into $G+1 partiti +ons; ## eg. -N=100 -G=7 given partitions [0..13][14..27][28..41][42..55][56 +..69][70..83][84..98][99] ## the last 'extra' partition is ignored. (For now.) ## merge each partition after the first for my $i ( 1 .. $#p ) { ## against all the following partitions starting with the first for( my $pA = 0; $pA < $p[ $i ]; ++$pA ) { ## $pA indexes to the next element to be merged ## if it is less the top element of the partition being merged if( $d[ $pA ] > $d[ $p[ $i ] ] ) { my $temp = $d[ $pA ]; ## Keep a copy of it $d[ $pA ] = $d[ $p[ $i ] ]; ## swap the lower element +into place my $pB; ## external declaration al +lows $pB to remember where it finished when the next loop terminates ## Insertion sort the element being swapped out ($temp) in +to the right place in the partition being merged with for( $pB = $p[ $i ]+1; $pB < ( $p[ $i ]+$GSize ) && $d[ $p +B ] < $temp; ++$pB ) { $d[ $pB-1 ] = $d[ $pB ]; }; ## Put the swapped out element into the gap we opened up $d[ --$pB ] = $temp; } # show; <STDIN>; } #show; } show;

Notes:

Searches for "in-place merges of sorted sub-partitions" turn up lots of people asking, but few good answers and lots of "just sort them together" -- completely impractical for my larger-than memory datasets.

Ive tried to relate what I'm doing to the formal literature (eg. http://comjnl.oxfordjournals.org/content/38/8/681.full.pdf, which seems to be very pertinent), but despite the time I've spent reading this sort of paper, I still find passages like:

Block Exchanges

A block exchange of two consecutive blocks U and V of sizes l1 and l2, respectively, that occupy segment L[c..c + l1 + l2 - 1 ] result in the movement of block U which occupies segment L[c..c + l1 - 1 ] to L[c + l2..c + l1 + l2 - 1 ], and in the movement of V initially in segment L[c + l1..c + l1 + l2 - 1 ] to segment L[c..c + l2 - 1 ]

as a description of: [...uuuuVVVVVVVVV...] becomes [...VVVVVVVVVuuuu...], entirely opaque for the first half dozen readings. Even the included figure doesn't help much. (Why oh why are they written this way?) Even the choice of U & V seems almost deliberate to make things hard to read.

What am I looking/hoping for?

Improvements to this algorithm (as opposed to the Perl implementation). Feels like there ought to be a way to avoid starting pA at the beginning for each extra partition.

Better algorithms. I've looked, but maybe someone knows of one.

Better approaches to the overall problem.

Any assessment of the complexity. I've tried 3 times and gotton 3 different answers.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked

In reply to Can you improve upon my algorithm. by BrowserUk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.