in reply to Randomize an array
Back in the day, Perl didn't have its own sort and just used the qsort() provided by the local C run-time library. I seem to recall that things like sort {[-1,1]rand(2)} could cause the algorythm to take forever to finish or even dump core.
Now a truely efficient sort should not be able to notice because it wouldn't do any more comparisons than it needed to. So perhaps the quality of Perl's sort is much higher than many of the qsort()s from back then.
- tye (but my friends call me "Tye")
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
RE: Re: Randomize an array
by Adam (Vicar) on Sep 08, 2000 at 05:01 UTC | |
We break from this post to give a brief description of quicksort for those who don't know. (Disclaimer: I didn't test that and I havn't looked any of this up, so I could be completly wrong and that code might not work) We now return to the post already in progress So you see, the amount of time @a = sort {(-1,1)[rand 2]} @a will take is finite and well defined, regardless of the function used to determine which side of the pivot to place each element. So the only reason it would dumb core would be that it ran out of memory, but it should never become hung (unless, as I said, it ran out of memory... those sneaky recursive algorithms). The only thing that varies in a quicksort is how well the pivot is chosen. If you pick an extreme (where everything ends up on one side of the pivot) then Qsort will take much longer then if you pick the middle where everything is balanced. So the point of all this is that I doubt the implementation has gotten better, but rather the hardware (128 MB ram sure beats 2 or 4 MB). Update: For more about sorting, check out these great documents I found: Here and here. Not to forget Mastering Algorithms with Perl from O'Reilly, which has a whole chapter devoted to sorting algorithms. BTW: After at least an hour of working on this post I have realized that a true quicksort would, in fact, go crazy given { (-1,1)[rand 2] } and I am now somewhat mystified as to why it converges at all. I think its time to go eat something and I will return to this later.
Update: My apologies to anyone that I have confused with this post. Maybe it will inspire you to learn more about sort. | [reply] [d/l] [select] |
by tilly (Archbishop) on Sep 08, 2000 at 06:01 UTC | |
Clearly what is called qsort really need not be. For many possible reasons. Now more details. You are right that the average time for qsort to work is n*log(n). However qsort actually has a worst case of n^2. In fact the worst case with a naive qsort is hit on an already sorted list. Perl currently uses a qsort with pivots chosen carefully to move the worst case to something unlikely. You are also right that no sorting algorithm can possibly beat having an average case of O(n*log(n)). Why not? For the simple reason that there are n! possible permutations you have to deal with. In m comparisons you can only distinguish at most 2^m of them. So the number of comparisons you need will be at least log_2(n!). Well up to a constant factor that is log(n!) which is
log(n!) = log(1*2*...*n)
= log(1) + log(2) + ... + log(n)
which is approximately the integral from 1 to n of log(x).
Which in turn is n*log(n)-n+1 plus error terms
from the approximation. (After a full derivation you get
Stirling's Approximation.)
Right now all that concerns us is that n*log(n) term out front. You cannot get rid of that. Now that said there are many sorting algorithms out there. qsort is simple and has excellent average performance, but there are others that have guaranteed performance and are order n on sorted datasets. Incidentally there is right now an interesting discussion of this on p5p, including a brand new discovery of a memory leak... | [reply] |
by Adam (Vicar) on Sep 09, 2000 at 00:33 UTC | |
| [reply] |
by tilly (Archbishop) on Sep 09, 2000 at 00:55 UTC | |