taffer has asked for the wisdom of the Perl Monks concerning the following question:

I have a script that currently sorts a multi-dimensional array by multiple fields so that in the end all of the elements of the array are sorted by field a, then by field b within field a, then by field c with field b, etc. I am currently using a bubble sort, but the dataset is starting to get large enough that it is taking too long to execute. To relieve this I decided to switch to a merge sort as I read that it was both more efficient, and a stable sorting algorithm. I got the algorithm working, but it is not sorting stably, so it is only sorted properly by the last search ran on it. Here is the code i currently have, any suggestions would be greatly appreciated:
sub mergeSort { #Takes reference to array, index to sort on, beginn +ing index, and end index my ($aref, $index, $begin, $end)=@_; my $size=$end-$begin; if($size<2) {return;} my $half=$begin+int($size/2); mergeSort($aref, $index, $begin, $half); mergeSort($aref, $index, $half, $end); for(my $i=$begin; $i<$half; ++$i) { if($$aref[$i][$index] > $$aref[$half][$index]) { my @va=@$aref[$i]; my $v=$$aref[$i][$index]; @$aref[$i]=@$aref[$half]; my $i=$half; while($i<$end-1 && $$aref[$i+1][$index] < $v) { (@$aref[$i], @$aref[$i+1])= (@$aref[$i+1], @$aref[$i]); ++$i; } @$aref[$i]=@va; } } } # end sub #Sort by uptime mergeSort(\@data, 8, 0, scalar(@data)); #Sort by 3rd section of interface cx/x/UX mergeSort(\@data, 12, 0, scalar(@data)); #Sort by 2nd section of interface cx/X/ux mergeSort(\@data, 11, 0, scalar(@data)); #Sort by 1st section of interface CX/x/ux mergeSort(\@data, 11, 0, scalar(@data));
For clarification, here is my previous sort code:
sub ASort { my ($aref, $index)=@_; for ($x=scalar(@$aref);$x>0;$x--){ for ($j=0;$j<$x-1;$j++){ if ($$aref[$j][$index]>$$aref[$j+1][$index]){ @temp=@$aref[$j+1]; @$aref[$j+1] = @$aref[$j]; @$aref[$j] = @temp; } } } } #Sort by uptime ASort(\@data, 8); #Sort by 3rd section of interface cx/x/UX ASort(\@data, 12); #Sort by 2nd section of interface cx/X/ux ASort(\@data, 11); #Sort by 1st section of interface CX/x/ux ASort(\@data, 11);

Replies are listed 'Best First'.
Re: Need help making my Merge Sort implementation stable
by moritz (Cardinal) on Jul 03, 2008 at 19:56 UTC
    Why don't you just use the built in sort function with a custom comparison function? It's really fast, and I think it's also stable.
      My understanding is that it would sort just the field, not the entire row. For example, it would just swap @data$i$index, not the @data$i. If it does, I am not sure how to construct the custom comparison function. Could you point me towards some info that would get me started on learning how to construct a custom comparison function that would move an entire row of a multi-dimensional array instead of just the field?
        use Data::Dump 'pp'; # only used for output my @data = map { [ map { int rand(3) } 1 .. 5 ] } 1 .. 10; my @sorted = sort { $a->[1] <=> $b->[1] # sort by column 1 ascending || $a->[0] <=> $b->[0] # then column 0 || $b->[3] <=> $a->[3] # then column 3 descending } @data; pp @sorted;

        gives

        ( [0, 0, 2, 2, 2], [0, 0, 0, 0, 1], [0, 0, 2, 0, 1], [1, 0, 2, 2, 0], [2, 0, 0, 2, 2], [0, 1, 0, 1, 2], [1, 1, 0, 2, 2], [1, 1, 1, 1, 2], [0, 2, 2, 0, 0], [2, 2, 2, 0, 0], )


        Unless I state otherwise, all my code runs with strict and warnings
        My understanding is that it would sort just the field, not the entire row.

        Not quite. Look at this small example:

        use strict; use warnings; use Data::Dumper; my @items = ( [0, 4, 2], [2, 5, 8], [-5, 0, -3], [3, 4, 0], [1, 4, 8], ); my @sorted = sort { $a->[1] <=> $b->[1] } @items; print Dumper \@sorted; __END__ $VAR1 = [ [ -5, 0, -3 ], [ 0, 4, 2 ], [ 3, 4, 0 ], [ 1, 4, 8 ], [ 2, 5, 8 ] ];

        (I tidied up the output a bit to include less line breaks).

        You can see that the array is sorted by the second column, but the result is a list of array references, just like the input.

        sort is very powerful, you only have to know how to use it ;-)

Re: Need help making my Merge Sort implementation stable
by pjotrik (Friar) on Jul 03, 2008 at 20:23 UTC
Re: Need help making my Merge Sort implementation stable
by Tanktalus (Canon) on Jul 03, 2008 at 20:25 UTC

    Rough guess, untested:

    @data = sort { $a->[8] <=> $b->[8] or $a->[12] <=> $b->[12] or $a->[11] <=> $b->[11] } @data;
    I'm missing a second check against index 11, but I don't have any idea why you're sorting on the same field twice, and it's a bit convoluted for me right now :-)

      Ahh, I fat fingered the second 11, it was supposed to be a 10.
Re: Need help making my Merge Sort implementation stable
by Limbic~Region (Chancellor) on Jul 03, 2008 at 22:48 UTC