Re: Re^2: Problems with custom sort

I would just add that the behaviour of sort is still dynamic with changes in internal workings listed in perldelta for 5.8.0rc2 - From this document ...

sort() is also fully reentrant, in the sense that the sort function can itself call sort(). This did not work reliably in previous releases.

sort() has been changed to use primarily mergesort internally as opposed to the earlier quicksort. For very small lists this may result in slightly slower sorting times, but in general the speedup should be at least 20%. Additional bonuses are that the worst case behaviour of sort() is now better (in computer science terms it now runs in time O(N log N), as opposed to quicksort's Theta(N**2) worst-case run time behaviour), and that sort() is now stable (meaning that elements with identical keys will stay ordered as they were before the sort). See the sort pragma for information.
The story in more detail: suppose you want to serve yourself a little slice of Pi.
```
    @digits = ( 3,1,4,1,5,9 );
```
A numerical sort of the digits will yield (1,1,3,4,5,9), as expected. Which 1 comes first is hard to know, since one 1 looks pretty much like any other. You can regard this as totally trivial, or somewhat profound. However, if you just want to sort the even digits ahead of the odd ones, then what will
```
    sort { ($a % 2) <=> ($b % 2) } @digits;
```
yield? The only even digit, 4, will come first. But how about the odd numbers, which all compare equal? With the quicksort algorithm used to implement Perl 5.6 and earlier, the order of ties is left up to the sort. So, as you add more and more digits of Pi, the order in which the sorted even and odd digits appear will change. and, for sufficiently large slices of Pi, the quicksort algorithm in Perl 5.8 won't return the same results even if reinvoked with the same input. The justification for this rests with quicksort's worst case behavior. If you run
```
   sort { $a <=> $b } ( 1 .. $N , 1 .. $N );
```
(something you might approximate if you wanted to merge two sorted arrays using sort), doubling $N doesn't just double the quicksort time, it quadruples it. Quicksort has a worst case run time that can grow like N**2, so-called quadratic behaviour, and it can happen on patterns that may well arise in normal use. You won't notice this for small arrays, but you will notice it with larger arrays, and you may not live long enough for the sort to complete on arrays of a million elements. So the 5.8 quicksort scrambles large arrays before sorting them, as a statistical defence against quadratic behaviour. But that means if you sort the same large array twice, ties may be broken in different ways.

Because of the unpredictability of tie-breaking order, and the quadratic worst-case behaviour, quicksort was almost replaced completely with a stable mergesort. Stable means that ties are broken to preserve the original order of appearance in the input array. So
```
    sort { ($a % 2) <=> ($b % 2) } (3,1,4,1,5,9);
```
will yield (4,3,1,1,5,9), guaranteed. The even and odd numbers appear in the output in the same order they appeared in the input. Mergesort has worst case O(N log N) behaviour, the best value attainable. And, ironically, this mergesort does particularly well where quicksort goes quadratic: mergesort sorts (1..$N, 1..$N) in O(N) time. But quicksort was rescued at the last moment because it is faster than mergesort on certain inputs and platforms. For example, if you really don't care about the order of even and odd digits, quicksort will run in O(N) time; it's very good at sorting many repetitions of a small number of distinct elements. The quicksort divide and conquer strategy works well on platforms with relatively small, very fast, caches. Eventually, the problem gets whittled down to one that fits in the cache, from which point it benefits from the increased memory speed.

Quicksort was rescued by implementing a sort pragma to control aspects of the sort. The stable subpragma forces stable behaviour, regardless of algorithm. The _quicksort and _mergesort subpragmas are heavy-handed ways to select the underlying implementation. The leading _ is a reminder that these subpragmas may not survive beyond 5.8. More appropriate mechanisms for selecting the implementation exist, but they wouldn't have arrived in time to save quicksort.

Comment on Re: Re^2: Problems with custom sort Select or Download Code