in reply to Re: Algorithms
in thread Algorithms

Or we might take your example of sorting. Perls sort is generic, its intended to be fast for the "average" data sets and cases. It turns out that the version used in some versions of perl has some _really_ bad cases however, and so it has been changed (isn't even quicksort anymore iirc).

Well, in 5.8.0, the default is mergesort; however, quicksort is still available on demand. But if you select quicksort, a trick will be performed to make it unlikely to hit one of the cases where quicksort behaves quadratic: large arrays will first be shuffled.

Anyway, the point is that in Perl its easy to just use sort to sort things. But in some situations it may not be advisable to do so. For instance it wouldnt make sense IMO to use sort if you were only adding one value and putting it in order. I might use binary search with splice, or even a single pass of a bubble sort if the list was small.

Actually one of the decisions for using the current implementation of mergesort was to be able to deal well with nearly sorted lists. The mergesort in Perl is faster if there are longer runs of sorted, or reversely sorted, elements. So, in 5.8.0, it does make sense to push an element on a sorted array, and then sort that list, instead of using a binary search and splice. Both operations will be linear (!), but the former will do almost everything in C.

Abigail

Replies are listed 'Best First'.
Re: Re: Algorithms
by Bluepixel (Beadle) on Jun 25, 2003 at 19:56 UTC
    Why doesn't perl use heapsort?
      Bluepixel wrote:
      Why doesn't perl use heapsort?
      Because there are no real advantages to heap sort for general sorting, except for the ability of doing in-situ sorting, but that's not relevant with Perl's sort routine. The particular implementation of mergesort Perl is using has a few advantages: it's stable and it performs very well with sorted, or nearly sorted data. It also has a tight upperbound, it's worst-case behaviour is O (N log N). And while the quicksort implementation isn't stable (it could be made stable at some costs) and runs in quadratic time with some inputs, it runs fast on random data - for two reasons: the algorithm used, and it's very memory cache friendly (most computers nowadays have a memory cache). Neither mergesort and heapsort are cache friendly - they go all over the place.

      Abigail