Re: mergesort scratch space question

A part of making algortihms more efficient involves keeping the inner loop as simple as possible. It seems to me that your idea makes the inner loop more complex thus slowing it down. Classic time/space trade off. I did a quick review of what Knuth had to say about mergesorts and quicksorts, and in the case of the former this seems to be the basis of the two variants (5.2.4-S:"Straight Two-way Merge", and 5.2.4 L:"List merge sort" ) of the natural (5.2.4-N:"Natural Two-way Merge") merge sort he presents. And with quicksort it is an explicitly mentioned feature

Hoare dubbed his method "quicksort", and that name is not inappropriate, since the inner loops of the computation are extremely fast on most computers. All comparisons during a given stage are made against the same key, so this key may be kept in a register. Only a single index needs to be changed between comparisons. Furthermore the amount of data movement is quite reasonable;

But personally I wouldn't let these comments or the authors so far in this column discourage you. Code a traditional implementation of mergesort. Code your version. Benchmark them against different data sets, small and large, with various properties (inorder, reverse order, random, semi random). Explore what makes them fast or slow. Post your results. :-)

As a leaver consider the two following linear search techniques as an example of the importance of a tight inner loop

use strict;
use warnings;
use Benchmark 'cmpthese';

sub linsearch {
    my ( $array, $item ) = @_;
    my $index = 0;
    $index++ while $index < @$array && $array->[$index] ne $item;
    return $index < @$array ? $index : -1;
}

sub fast_linsearch {
    my ( $array, $item ) = @_;
    my $index = -1;
    push @$array, $item;
    1 while $array->[ ++$index ] ne $item;
    pop @$array;
    return $index < @$array ? $index : -1;
}

our $array = $array =[ 1 .. 100 ];
cmpthese(
    -1,
    {
        lin_s  => 'linsearch($::array,100)',
        fast_s => 'fast_linsearch($::array,100)',
    }
);
__END__
Benchmark: running fast_s, lin_s, each for at least 1 CPU seconds...
    fast_s:  1 wallclock secs ( 1.01 usr +  0.00 sys =  1.01 CPU) @ 11
+182.81/s (n=11317)
     lin_s:  1 wallclock secs ( 1.03 usr +  0.00 sys =  1.03 CPU) @ 57
+61.40/s (n=5940)
          Rate  lin_s fast_s
lin_s   5761/s     --   -48%
fast_s 11183/s    94%     --
[download]

Actually your post got me interested and I ended up implementing as an excerise Knuths Algorithm 5.2.4S. In this the idea is to reduce the work of the inner loop by imposing run lengths of increasing size instead of scanning for "stepdowns" (ie when the input file is out of order). I include it below for your edification... (A few notes, this is a two merge sort because we treat either end of the initial array as being the items to merge, first we "merge" the left with the right elements, write then into the left of the buffer. Then we merge the 2nd and 2nd last together writing them in _reverse_ order into the right side of the buffer. we repeat this until the array has been copied. Then we use a larger run length (2*) and repeat. Eventually the entire array is merged and sorted. for more read knuths 3rd book in the Art Of Programming Series.)

sub straight_merge {
    my ($array)=(@_);
    my $buffer=[];
    $#$buffer=$#$array;
    my $len=1; # starts with "runs" of length 1
    while ($len<@$array) {
        my ($al,$ar)=(0,$#$array);
        my ($bl,$br)=(-1,scalar @$array);
        my $dir=1;
        my ($lremain,$rremain)=($len,$len); # how many elements from e
+ach side
        PASS:
        while ($al<=$ar) {
            if ($array->[$al] < $array->[$ar]) {
                $bl+=$dir;
                $buffer->[$bl]=$array->[$al];
                $al++;
                next if --$lremain>0;
                while (1) {
                    $bl+=$dir;
                    last PASS if $bl==$br;
                    $buffer->[$bl]=$array->[$ar];
                    $ar--;
                    last unless --$rremain;
                }
            } else {
                $bl+=$dir;
                $buffer->[$bl]=$array->[$ar];
                $ar--;
                $rremain--;
                next if $rremain>0;
                while (1) {
                    $bl+=$dir;
                    last PASS if $bl==$br;
                    $buffer->[$bl]=$array->[$al];
                    $al++;
                    last unless --$lremain;
                }
            }
            #Switch sides
            ($lremain,$rremain)=($len,$len);
            $dir=-$dir;
            ($bl,$br)=($br,$bl);
            if ($ar-$al<$len) {
                # do a partial copy
                while (1) {
                    $bl+=$dir;
                    last PASS if $bl==$br;
                    $buffer->[$bl]=$array->[$al];
                    $al++;
                }
            }
            #eof PASS
        }
        #flip the buffers
        $len+=$len;
        last if $len>@$array;
        ($array,$buffer)=($buffer,$array);
    }
    unless ($array==$_[0]) {
        #do a copy
        @$array=@$buffer;
    }
}
[download]

A last thought about developing optimal algorithms in perl: In perl, optimizing is often a matter of reducing ops and not reducing "instructions". This does not mean that a low level assembly or C implementation using an algorithm so optimised will actually be optimal. For instance in the above code all of the operations will be done at a fairly high level by perl, with each statement being worth many CPU instructions. Wheras the above when translated to assembly would be tiny, with the inner loops being only a few instructions. Wheras an algorithm in perl that takes advantage of say splice() would probably outperform it as the splice() operation would be less CPU time than the equivelent pure perl operations that do the same thing. However that algorithm when converted to assembly would be much bigger and thus be slower. IE: YMMV.

:-)

Thanks for the thought provoking post...

--- demerphq
my friends call me, usually because I'm late....

Comment on Re: mergesort scratch space question Select or Download Code