in reply to Re^2: Problems with custom sort
in thread Problems with custom sort
The story in more detail: suppose you want to serve yourself a little slice of Pi.
@digits = ( 3,1,4,1,5,9 );
A numerical sort of the digits will yield (1,1,3,4,5,9), as expected. Which 1 comes first is hard to know, since one 1 looks pretty much like any other. You can regard this as totally trivial, or somewhat profound. However, if you just want to sort the even digits ahead of the odd ones, then what will
sort { ($a % 2) <=> ($b % 2) } @digits;
yield? The only even digit, 4, will come first. But how about the odd numbers, which all compare equal? With the quicksort algorithm used to implement Perl 5.6 and earlier, the order of ties is left up to the sort. So, as you add more and more digits of Pi, the order in which the sorted even and odd digits appear will change. and, for sufficiently large slices of Pi, the quicksort algorithm in Perl 5.8 won't return the same results even if reinvoked with the same input. The justification for this rests with quicksort's worst case behavior. If you run
sort { $a <=> $b } ( 1 .. $N , 1 .. $N );
(something you might approximate if you wanted to merge two sorted arrays using sort), doubling $N doesn't just double the quicksort time, it quadruples it. Quicksort has a worst case run time that can grow like N**2, so-called quadratic behaviour, and it can happen on patterns that may well arise in normal use. You won't notice this for small arrays, but you will notice it with larger arrays, and you may not live long enough for the sort to complete on arrays of a million elements. So the 5.8 quicksort scrambles large arrays before sorting them, as a statistical defence against quadratic behaviour. But that means if you sort the same large array twice, ties may be broken in different ways.
Because of the unpredictability of tie-breaking order, and the quadratic worst-case behaviour, quicksort was almost replaced completely with a stable mergesort. Stable means that ties are broken to preserve the original order of appearance in the input array. So
sort { ($a % 2) <=> ($b % 2) } (3,1,4,1,5,9);
will yield (4,3,1,1,5,9), guaranteed. The even and odd numbers appear in the output in the same order they appeared in the input. Mergesort has worst case O(N log N) behaviour, the best value attainable. And, ironically, this mergesort does particularly well where quicksort goes quadratic: mergesort sorts (1..$N, 1..$N) in O(N) time. But quicksort was rescued at the last moment because it is faster than mergesort on certain inputs and platforms. For example, if you really don't care about the order of even and odd digits, quicksort will run in O(N) time; it's very good at sorting many repetitions of a small number of distinct elements. The quicksort divide and conquer strategy works well on platforms with relatively small, very fast, caches. Eventually, the problem gets whittled down to one that fits in the cache, from which point it benefits from the increased memory speed.
Quicksort was rescued by implementing a sort pragma to control aspects of the sort. The stable subpragma forces stable behaviour, regardless of algorithm. The _quicksort and _mergesort subpragmas are heavy-handed ways to select the underlying implementation. The leading _ is a reminder that these subpragmas may not survive beyond 5.8. More appropriate mechanisms for selecting the implementation exist, but they wouldn't have arrived in time to save quicksort.
|
---|