Slightly off topic, albeit a question. Please excuse.

I have contempated the idea of merge sorting in place, with a smaller scratch size (1 - N/2):

The array to be sorted is in memory. When it is split, the root index and length of the recursed slice is passed on.
After having gone down to a stage where bubble/insertion sorts are used, or when slices lengthed 1 element are merged, the merge should (in my opinion) go as follows:

The next elements of the two slices to be merged are known.
Merging of the elemnts is performed normally, except that when writing, if the position where the next element to be written (over slice 1, then slice 2) is the position of the current position of slice 1 then two elements are pushed to a FIFO structure.
When merging, if the FIFO structure has any elements pending, they are merged with slice 2. Otherwise the next element of slice 1 is merged with slice 2.
When the writing index reaches the next elem of slice two then the two slices are merged and we return from the recursion.

My question is why hasn't such a scheme been implemented before? I always see documentation of sorts saying that mergesort's main setback is the scratch size it needs - and yet most say that quicksorts disadvantages can be taken care of by tweaking - hence i figured i'm missing something... Could someone tell me what that is?

Thanks in advance

-nuffin
zz zZ Z Z #!perl

Replies are listed 'Best First'.
Re: mergesort scratch space question
by BazB (Priest) on Oct 19, 2002 at 16:45 UTC

    From what I can see from Robert Sedgewick's Algorithms in C, a mergesort algorithm becomes significantly more complex - possibly affecting the stability of the sort, something which is a valued property of a mergesort - when it is modified to do an in-place sort, removing the requirement of extra space proportional to the number of records.

    If the the extra space is a problem, something like a heapsort may be an alternative.
    Both mergesort and heapsort algorithms generally have N log N runtimes.

    Cheers.

    BazB

    Update: BTW, I recommend the aforementioned book if you're interested in an easy to understand, but complete, reference to algorithms.
    I've not looked at O'Reilly's Mastering Algorithms with Perl , however I believe it is also a great textbook.

    I've only used a basic mergesort, so have no personal experience with more complex implementations.

Re: mergesort scratch space question
by Anonymous Monk on Oct 19, 2002 at 20:30 UTC
    Because it is more complicated than it has to be?

    Observe. Let one of the slices to be aligned at the top of the pre-allocated space that the merge will go into. Then the merge can be safely done with no need for any FIFO, the slice in the allocated space is eaten away as the merged set grows beneath it.

    With this knowledge we can implement a destructive copy_sort, that copies data from one buffer to another, sorting in the process. Implementation. If you have one element then move it. If you have more, then copy_sort the top half of the first buffer to the top half of the second. Then copy_sort the bottom half of the first buffer to the top half of the first buffer. Now merge first and second buffers in the second.

    Now you can implement a merge_sort that works in place scratch size of about N/2 without loss of efficiency. Allocate the scratch area. Then copy_sort the top half of the original data to the scratch area. Now copy_sort the bottom half of the original data to the top half of the original space. Merge the two in the original space. Free the scratch region. Return.

    If you are hard up on space, you can lose some efficiency but only need 1/3N scratch. Allocate scratch area. copy_sort the top third to the new space. copy_sort the bottom third to the top third. copy_sort the middle-third into the bottom third. Merge bottom third into the top third using the top 2/3. Merge scratch area back. Free the scratch region. Return.

    Other variations on the theme are certainly possible.

    I haven't heard of anyone implementing any of these, which mainly means that I never looked. Most presentations of common algorithms try to get the basic idea across, not the best implementation. And sorting is probably the most heavily studied algorithm problem, so you won't have a new (good) sorting idea at random.

      so you won't have a new (good) sorting idea at random.

      Indeed. And considering that merge sorting was one of earliest sorts to be studied and considered (von Neuman suggested it in 1945) a truely signifigant improvement based on its ideas is probably especially unlikely.

      --- demerphq
      my friends call me, usually because I'm late....

Re: mergesort scratch space question
by demerphq (Chancellor) on Oct 20, 2002 at 19:35 UTC
    A part of making algortihms more efficient involves keeping the inner loop as simple as possible. It seems to me that your idea makes the inner loop more complex thus slowing it down. Classic time/space trade off. I did a quick review of what Knuth had to say about mergesorts and quicksorts, and in the case of the former this seems to be the basis of the two variants (5.2.4-S:"Straight Two-way Merge", and 5.2.4 L:"List merge sort" ) of the natural (5.2.4-N:"Natural Two-way Merge") merge sort he presents. And with quicksort it is an explicitly mentioned feature

    Hoare dubbed his method "quicksort", and that name is not inappropriate, since the inner loops of the computation are extremely fast on most computers. All comparisons during a given stage are made against the same key, so this key may be kept in a register. Only a single index needs to be changed between comparisons. Furthermore the amount of data movement is quite reasonable;

    But personally I wouldn't let these comments or the authors so far in this column discourage you. Code a traditional implementation of mergesort. Code your version. Benchmark them against different data sets, small and large, with various properties (inorder, reverse order, random, semi random). Explore what makes them fast or slow. Post your results. :-)

    As a leaver consider the two following linear search techniques as an example of the importance of a tight inner loop

    use strict; use warnings; use Benchmark 'cmpthese'; sub linsearch { my ( $array, $item ) = @_; my $index = 0; $index++ while $index < @$array && $array->[$index] ne $item; return $index < @$array ? $index : -1; } sub fast_linsearch { my ( $array, $item ) = @_; my $index = -1; push @$array, $item; 1 while $array->[ ++$index ] ne $item; pop @$array; return $index < @$array ? $index : -1; } our $array = $array =[ 1 .. 100 ]; cmpthese( -1, { lin_s => 'linsearch($::array,100)', fast_s => 'fast_linsearch($::array,100)', } ); __END__ Benchmark: running fast_s, lin_s, each for at least 1 CPU seconds... fast_s: 1 wallclock secs ( 1.01 usr + 0.00 sys = 1.01 CPU) @ 11 +182.81/s (n=11317) lin_s: 1 wallclock secs ( 1.03 usr + 0.00 sys = 1.03 CPU) @ 57 +61.40/s (n=5940) Rate lin_s fast_s lin_s 5761/s -- -48% fast_s 11183/s 94% --
    Actually your post got me interested and I ended up implementing as an excerise Knuths Algorithm 5.2.4S. In this the idea is to reduce the work of the inner loop by imposing run lengths of increasing size instead of scanning for "stepdowns" (ie when the input file is out of order). I include it below for your edification... (A few notes, this is a two merge sort because we treat either end of the initial array as being the items to merge, first we "merge" the left with the right elements, write then into the left of the buffer. Then we merge the 2nd and 2nd last together writing them in _reverse_ order into the right side of the buffer. we repeat this until the array has been copied. Then we use a larger run length (2*) and repeat. Eventually the entire array is merged and sorted. for more read knuths 3rd book in the Art Of Programming Series.)
    sub straight_merge { my ($array)=(@_); my $buffer=[]; $#$buffer=$#$array; my $len=1; # starts with "runs" of length 1 while ($len<@$array) { my ($al,$ar)=(0,$#$array); my ($bl,$br)=(-1,scalar @$array); my $dir=1; my ($lremain,$rremain)=($len,$len); # how many elements from e +ach side PASS: while ($al<=$ar) { if ($array->[$al] < $array->[$ar]) { $bl+=$dir; $buffer->[$bl]=$array->[$al]; $al++; next if --$lremain>0; while (1) { $bl+=$dir; last PASS if $bl==$br; $buffer->[$bl]=$array->[$ar]; $ar--; last unless --$rremain; } } else { $bl+=$dir; $buffer->[$bl]=$array->[$ar]; $ar--; $rremain--; next if $rremain>0; while (1) { $bl+=$dir; last PASS if $bl==$br; $buffer->[$bl]=$array->[$al]; $al++; last unless --$lremain; } } #Switch sides ($lremain,$rremain)=($len,$len); $dir=-$dir; ($bl,$br)=($br,$bl); if ($ar-$al<$len) { # do a partial copy while (1) { $bl+=$dir; last PASS if $bl==$br; $buffer->[$bl]=$array->[$al]; $al++; } } #eof PASS } #flip the buffers $len+=$len; last if $len>@$array; ($array,$buffer)=($buffer,$array); } unless ($array==$_[0]) { #do a copy @$array=@$buffer; } }
    A last thought about developing optimal algorithms in perl: In perl, optimizing is often a matter of reducing ops and not reducing "instructions". This does not mean that a low level assembly or C implementation using an algorithm so optimised will actually be optimal. For instance in the above code all of the operations will be done at a fairly high level by perl, with each statement being worth many CPU instructions. Wheras the above when translated to assembly would be tiny, with the inner loops being only a few instructions. Wheras an algorithm in perl that takes advantage of say splice() would probably outperform it as the splice() operation would be less CPU time than the equivelent pure perl operations that do the same thing. However that algorithm when converted to assembly would be much bigger and thus be slower. IE: YMMV.

    :-)

    Thanks for the thought provoking post...

    --- demerphq
    my friends call me, usually because I'm late....