|
|---|
| Replies are listed 'Best First'. | |||||||
|---|---|---|---|---|---|---|---|
|
Re: Fisher-Yates theory
by MarkM (Curate) on Jul 24, 2003 at 05:39 UTC | |||||||
In addition to what sauoq wrote: I'm not convinced that the Fisher-Yates shuffle actually has very much theory with regard to random order involved at all. On the surface, each array entry has the appearance of being given a single opportunity at being swapped with another array entry. In practice, array entries nearer to the beginning of the array have additional opportunities of being swapped (as a result of later swaps), meaning that less shuffling happens at the end of the array than at the beginning. One could argue in great length whether additional shuffling in the array improves the randomness, or whether it produces patterns in the longer term, but it isn't difficult to argue that regardless of which conclusion is valid, the fact that the entries nearer to the beginning have additional chances of being shuffled, while entries nearer the end have fewer chances, that *one* of these ends will be improved more than the other, therefore the algorithm is not optimal. UPDATE: See node Fisher-Yates theory... does this prove that it is invalid? for my 'proof' that Fisher-Yates weights the results. | [reply] | ||||||
by Abigail-II (Bishop) on Jul 24, 2003 at 08:04 UTC | |||||||
I'm not convinced that the Fisher-Yates shuffle actually has very much theory with regard to random order involved at all. Well, you are wrong. The first publication of the Fisher-Yates shuffle dates from 1938. It's example 12 in the book Statistical Tables, whose authors are R.A. Fisher and F.Yates. It was also discussed by R. Durstenfeld, in 1964. Communications of the ACM, volume 7, pp 420. And of course Donald Knuth mentions it, in Volume 2 of The Art of Computer Programming. In the third edition, it's algorithm P, section 3.4.2 on page 145. Abigail | [reply] | ||||||
If there was ever a node that deserves to make the Best Nodes, that's it... I bookmarked it so I can read that article from ACM's digital library.
UPDATE: I read the paragraph in the ACM digital library, for algorithm 235. How did you ever find that citation? It is the algorithm, but he doesn't credit Fisher-Yates...
| [reply] | ||||||
by Abigail-II (Bishop) on Jul 25, 2003 at 06:13 UTC | |||||||
by RMGir (Prior) on Jul 25, 2003 at 11:19 UTC | |||||||
by MarkM (Curate) on Jul 25, 2003 at 01:00 UTC | |||||||
UPDATE: As pointed out by others, I made an error when translating the code. See their summaries for good explanations. Cheers, and thanks everyone. (tail between legs) In a previous article, I was challenged for doubting the effectiveness of the Fisher-Yates shuffle as described in perlfaq. Below, I have written code that exhausts all possible random sequences that could be used during a particular Fisher-Yates shuffle. Statistically, this should be valid, as before the shuffle begins, there is an equal chance that the random sequence generated could be 0 0 0 0 0 as 0 1 2 3 4 as 4 4 4 4 4. By exhaustively executing the Fisher-Yates shuffle, and calculating the total number of occurrences that each result set is produced, we can determine whether the Fisher-Yates shuffle has the side effect of weighting the results, or whether the shuffle is truly random, in that it should be approximately well spread out.
With the above code, I was able to determine that with a deck size of 5, and an initial set of 1 2 3 4 5, there is three times the probability that the resulting set will be 3 1 2 5 4 than the probability that the resulting set will be 2 3 4 5 1. To me, this indicates that this theory is flawed. If anybody needs to prove to themselves that the test is exhaustive, print out "@$rand" in the shuffle subroutine. Please analyze the code carefully, pull out your school books, and see if I have made a mistake. Cheers, | [reply] [d/l] | ||||||
by jsprat (Curate) on Jul 25, 2003 at 01:51 UTC | |||||||
by BrowserUk (Patriarch) on Jul 25, 2003 at 02:04 UTC | |||||||
by adrianh (Chancellor) on Jul 25, 2003 at 01:58 UTC | |||||||
by halley (Prior) on Jul 24, 2003 at 13:18 UTC | |||||||
If someone says they're not convinced, you can't disprove that. One may heap on additional evidence to attempt to convince them, but they may continue to be unconvinced for rational or irrational reasons. This site often discusses matters of faith. There's no right or wrong about being convinced. "You are wrong" demarks an absolutism, both in passive grammar and in connotation. I prefer a community which demonstrates respect for others' faith and others' opinion. Fisher-Yates may have been proven effective by way of analyzing the results for uniform potential entropy. It may have been proven by logical analysis of the probabilities on each successive swap. Sharing that evidence is helpful, but I ask politely for everyone to build a constructive community, not one which promotes controversy and arrogance. -- | [reply] | ||||||
by demerphq (Chancellor) on Jul 24, 2003 at 17:10 UTC | |||||||
by jsprat (Curate) on Jul 24, 2003 at 08:30 UTC | |||||||
On the surface, each array entry has the appearance of being given a single opportunity at being swapped with another array entry. In practice, array entries nearer to the beginning of the array have additional opportunities of being swapped (as a result of later swaps), meaning that less shuffling happens at the end of the array than at the beginning. That is what intuition says, but in this case intuition falls short of reality. A Fisher-Yates shuffle avoids (not introduces) bias by making the pool smaller for each iteration. There are n! possible permutations of a set of n items. After the first iteration, an F-Y shuffle gives n possible outcomes, each equally likely. The second iteration yields (n - 1) for each of the n possible outcomes, leaving us with n*(n-1) possibilities - again equally likely. Follow that to its conclusion, you get n(n-1)(n-2)...1 possibilities, each equally likely. For an example take a 3 item set. There are 3! (= 6) possible permutations of this set if it is shuffled. The first iteration of the loop, there are three possibilities: a-(1 2 3), b-(2 1 3), and c-(3 2 1). The second iteration only swaps the 2nd and 3rd elements, so for a you have an equal possibility of (1 2 3) and (1 3 2); for b - (2 1 3) and (2 3 1); for c - (3 2 1) and (3 1 2). None of the possibilities are duplicated, each one has a 1/6 chance of being selected.
Six possibilities, each equally likely. Another way to look at it is this: The first element has a 2/3 chance of getting swapped the first time and a 1/2 chance the second - giving it a 1/2 * 2/3 = 1/3 chance of ending up in any given slot. Update: Finally, Re: Re: Re: random elements from array - new twist shows a statistical analysis of a Fisher-Yates shuffle. Whew, I'm done. I hope this wasn't homework - or if it was, Anonymous Monk learned something ;-) | [reply] [d/l] | ||||||
by sauoq (Abbot) on Jul 24, 2003 at 06:31 UTC | |||||||
One could argue in great length whether additional shuffling in the array improves the randomness, or whether it produces patterns in the longer term, If your RNG was perfect, additional shuffles would neither improve or degrade the randomness. but it isn't difficult to argue that regardless of which conclusion is valid, the fact that the entries nearer to the beginning have additional chances of being shuffled, while entries nearer the end have fewer chances, that *one* of these ends will be improved more than the other, therefore the algorithm is not optimal. Unless neither conclusion is valid (see above) in which case a single shuffle is no better or worse than multiple shuffles and the algorithm is optimal. In practice, however, our RNGs aren't perfect and you may have a point. I suppose you could resolve it by keeping track of the original indexes of the elements and then apply your map and do a second FY shuffle starting with the previous last and moving backwards to the previous first... still linear in time and space... but I doubt it is worth it. :-) -sauoq "My two cents aren't worth a dime."; | [reply] | ||||||
by Abigail-II (Bishop) on Jul 24, 2003 at 08:25 UTC | |||||||
This was pointed out by R. Salfi: COMPSTAT 1974, Vienna: 1974, pp 28 - 35. See also the documentation of the Shuffle module on CPAN. Abigail | [reply] | ||||||
by rir (Vicar) on Jul 24, 2003 at 12:58 UTC | |||||||
If your RNG was perfect, additional shuffles would neither improve or degrade the randomness. It seems that additional "shuffling" is required to prevent a permutation of the array where there is a correlation between items that are swapped. It seems the "additional shuffling" is required. Consider the shuffling of an array of three items without the so called "additional shuffling": Starting from the left: Updated: Per Abigail's criticism of a misstyped table and obscurity. To clarify: I take issue with: On the surface, each array entry has the appearance of being given a single opportunity at being swapped with another array entry. In practice, array entries nearer to the beginning of the array have additional opportunities of being swapped (as a result of later swaps), meaning that less shuffling happens at the end of the array than at the beginning. One could argue in great length whether additional shuffling in the array improves the randomness, or whether it produces patterns in the longer term, but it isn't difficult to argue that regardless of which conclusion is valid, the fact that the entries nearer to the beginning have additional chances of being shuffled, while entries nearer the end have fewer chances, that *one* of these ends will be improved more than the other, therefore the algorithm is not optimal. Which seems to say that the possible movement of an already moved item is a weakness of the algorithm. But an array cannot be randomly shuffled in place without allowing elements to be moved more than once. The idea that the errors of a RNG will more adversely affect the items to which that RNG is more often applied is an argument that is more subtle. I reject it out of hand. If the RNG is incorrect the results will not be random and we are quibling about how obvious the error is. This distinction may be very important in practice and there are practical solutions to get a pseudo-randomness that is okay for varying definitions of okay. | [reply] [d/l] | ||||||
by Abigail-II (Bishop) on Jul 24, 2003 at 13:16 UTC | |||||||
by gjb (Vicar) on Jul 24, 2003 at 08:06 UTC | |||||||
IMHO the definition of a "good" shuffle algorithm is that starting from a sequence, each of its permutations has the same probability to be the outcome of the shuffle on that starting sequence. Many "home grown" algorithms violate this criterion and hence are to be considered, well, bad shuffle algorithms. Fisher & Yates satisfies this criterion however (the proof is left as "homework" to the original poster). You can have a look at Algorithm::Numerical::Shuffle for references as well as a short discussion. Just my 2 cents, -gjb- | [reply] | ||||||
|
Re: Fisher-Yates theory
by sauoq (Abbot) on Jul 24, 2003 at 05:09 UTC | |||||||
The following is straight out of the FAQ. Use perldoc -q shuffle to read the whole entry.
Update: Uh. Gee... you did ask for the theory not the code. Sorry. What is there to tell you? It's O(N) for obvious reasons and it works on the data in place so memory usage is minimal. It's a very straightforward algorithm. -sauoq "My two cents aren't worth a dime."; | [reply] [d/l] | ||||||
by Abigail-II (Bishop) on Jul 24, 2003 at 08:13 UTC | |||||||
For details, see the Knuth reference I made in another post in this thread. Abigail | [reply] | ||||||
|
Re: Fisher-Yates theory
by Skeeve (Parson) on Jul 24, 2003 at 05:58 UTC | |||||||
BTW: This is not a perl question. | [reply] | ||||||
by artist (Parson) on Jul 24, 2003 at 06:06 UTC | |||||||
| [reply] | ||||||
by PodMaster (Abbot) on Jul 24, 2003 at 07:38 UTC | |||||||
On a slight sidenote, PerlMonks is a community. Say it with me, not a forum ;)
| [reply] | ||||||
|
Re: Fisher-Yates theory
by bart (Canon) on Aug 06, 2003 at 12:47 UTC | |||||||
As and example take a deck of cards, 52 of them. Pick any card. You'll agree that the card chosen is completely random, no? No preference towards the old order? OK. Now the most important step: take it out of the deck. That's your first card of the new, shuffled deck. Now you have a deck of 51 cards left. Pick one, any one... That's your second card, completely random. Now you have a deck of 50 cards left... Repeat, taking one random card out of the deck each time, and adding it to the new deck, until you're left with an empty deck, with no cards left. Your new deck now contains all cards and is shuffled completely randomly. As an optimisation for the implementation, you can see that the total number of cards in both decks is always the same. So instead of physically creating two decks, we take a paper bookmark and insert it between the old and the new deck. We move the chosen card out of the old deck and into the new deck, simply by swapping the chosen card and the former first card of the old deck, and then moving the bookmark one position up so that it's now the last card of the new deck. Since the order of the new deck is completely independent of the order in the old deck, doing it this way, changing the order of the old deck while you're at it, doesn't influence the statistical properties of the end result, at all. | [reply] | ||||||