Re: random permutations
by merlyn (Sage) on Mar 11, 2001 at 23:41 UTC
|
There won't be enough calls to the sort comparison to get a fairly shuffled deck. And by that, I don't mean moderately shuffled — I mean shuffled in a fair way where every possibly shuffling is equally likely.
I forget the exact proof... I think abigail is much better than explaining this. I bet if you could check the comp.lang.perl.misc archives, you'd find abigail's explanation on this very topic.
-- Randal L. Schwartz, Perl hacker | [reply] |
|
|
The exact proof is simple. There is a maximum number of
comparisons K which may be needed to sort the list of length
N. Each comparison gives a random binary decision.
Therefore we have 2**K equally likely outcomes, each of
which winds up with some sort order. And we have N!
possible sort orders. Now can the sort orders possibly
all come out even? The answer is no for N larger than 2
because if N is at least 3 then N! is divisible by 3, and
therefore 1/N! has an infinite (repeating) decimal expansion
while 1/2**K has a decimal expansion that terminates,
therefore the actual probabilities of the buckets all have
terminating expansions.
Which sort orders are favoured depends on the details of
the sort algorithm.
Now if you want the idea of using a sort to scramble
elements, you can do it with a Schwartzian sort. Like this:
my @shuffled = map {$_->[1]}
sort {$a->[0] <=> $b->[0]}
map {[rand(), $_]} @orig;
Or use Fischer-Yates as often discussed.
BTW I am using infinite decimals vs non-infinite as a
shortcut here to avoid talking about divisibility. This
works because we talk about base 10, and 2 divides 10. But
this is but one of a family of math arguments based on
divisibility. For an example of another fun one, it is not
hard to show that the Gregorian calendar has a number of
days divisible by 7 but not 49. From that you can show,
for instance, that the thirteenth of the month does not fall
exactly evenly between the days of the week. It takes
considerably more effort to figure out that the 13'th falls
on Friday more often than straight chance would lead you to
expect. :-) | [reply] [d/l] |
|
|
| [reply] |
Re: random permutations
by gryng (Hermit) on Mar 11, 2001 at 22:27 UTC
|
| [reply] [d/l] |
|
|
Thank you for your comments.
Yes, the camel book states that some implementations of qsort
do crash when fed inconsistent comparision results; that is a valid
point.
Also, your version feels more perl-like. But I actually choose the
constants 100 and 50 in 'int(rand(100)-50)' to have only 2% chance on
an equality result (i.e. 0); my intuition tells me that thus the
result will be more random (this will depend on the implementation of
qsort; but qsort will typically take the same decission when faced with
equality between the elements).
Greetz -- Mave
| [reply] |
|
|
Hi mave,
The text-book way of doing qsort 'properly' (already I'm on flame-war ground :) ), is use the median-three variant (median-three vairant means pick the first, middle and last item and choose the median of these for the partition point)with three speed enhancements: first use the two remaining items from the median-three step as sentinels (instead of having a second conditional in the innerloop), next at n=20-60ish (hardware dependent constant) switch to insertion sort, and lastly when deciding wether to swap or keep in place an equal item, choose to swap it.
Of course, all this is meaningless babble. The best thing to do is go in and test it. Hmm, unfortunately my boss got back from out of town work and is on the rampage this morning handing out more work for us all :) . I suggest a randomness test on output (chi-squared) and/or running sample data and counting the number of swaps. I also suggest using this when testing:
my @list = map $_->[1], sort {$a->[0]<=>$b->[0]} map [rand,$_], (0..99);
Which should be much faster on larger sets, and you can change the rand part to int rand $n, which can have $n be the (inverse of the) chance that two items will be equal.
Good Luck,
Gryn
| [reply] [d/l] |
|
|
|
|
|
|
If you want it to be either -1 or 1, but never 0, you could use
sort { int(rand 2)*2-1 } @list
| [reply] [d/l] |