RE (tilly) 3: Fisher-Yates Shuffle

You make one pass through the array. At each step you will make one swap. The idea is that at the first step you choose what element belongs at the end of the array. At the second step you choose the next to last element of the array out of what remained. And so on until you get to choosing the first element out of..gee..one element. :-)

Now how many times will you swap an element for itself? On average the last element gets swapped 1/n times, the next one 1/(n-1), etc down to 1/1. That is:

1 + 1/2 + 1/3 + ... + 1/n

Which is, after suitable rearranging:

1/2 + 1/2n +
     (1 + 1/2)/2 +
     (1/2 + 1/3)/2 +
     (1/3 + 1/4)/2 +
     .  .  .       +
     (1/(n-1) + 1/n)/2

Aside from the pieces at the ends, this turns out to be a sum of local trapezoidal approximations to the integral from 1 to n of 1/x. The error terms turn out to converge to a constant, and the integral that you are approximating is log(n)-log(1)=log(n). (Natural logs, of course, are the only kind that most mathematicians care about.)

Therefore on average the number of swaps you save is log(n) plus a complicated constant plus some small terms.

Now this is on average. What will happen realistically? Well instead of adding up numbers we are adding up random variables. The calculation this time makes the previous one I sketched seem trivial. (Note, the variables are not independent.) But glory halleluja! The variance of the sum for any number of terms turns out to be bounded by a constant (and not a big one either) so in fact you can put absolute upper bound on the likelyhood it varies by more than, say, 10 from that average. So you can say that within (eg) a 99% confidence interval it will lie within some fixed distance of log(n), and that estimate will be true for n=10, n=1000, n=1,000,000, and n=10^100.

Now how does this fit in big-O notation? Well first of all big-O notation as defined by Knuth doesn't really fit for random algorithms. Darn. (Easy enough to modify though.) Secondly people don't really use the word the same way that Knuth did. (eg Hash lookups as implemented in Perl are O(n). Everyone calls them O(1). Conversely O(n) algorithms are really O(n*n) as well, but we say, "That is O(n*n) so it is not a good algorithm.") Double darn. But hey, that is OK. After all big-O, little-o were invented by Bachmann in 1894 for number theory, not algorithms. (And were later popularized by Landau.) Language does change.

Besides which, your test (averaging repeated runs) is going to test average behaviour, and not the outliers.

Therefore whether or not you argue about the use of the phrase, "O(1)", my underlying comment is an accurate statement about what you are trying to measure. The benefit of the branch will be seen, on average, log(n) times plus some constant on an array of size n. Putting the logic in will cost you n times.

An incidental note. Perl compiles everything down to portable op codes, not assembler, and runs those codes through an interpreter. Therefore thinking about how a C compiler would solve a problem is missing the point. perl simply does not deal with the physical machine at the same level that, say, gcc does. That if check is going to be dealt with as an interpreted statement.

BTW take a look at my home node. I may have ranted a bit here, but that is because you wandered too close to what I am really good at. Namely math. :-)

EDIT
Good and out of practice. :-(

When I put it on paper the random variables actually were independent with variances of (n-1)/(n*n), so the variance of the sum is again log(n)+O(1), and the standard deviation O(sqrt(log(n))). Therefore on a run you save:

log(n) + O(sqrt(log(n)))

iterations...

Comment on RE (tilly) 3: Fisher-Yates Shuffle

Replies are listed 'Best First'.
RE: RE (tilly) 3: Fisher-Yates Shuffle by Adam (Vicar) on Aug 30, 2000 at 07:29 UTC
First of all... don't worry about ranting. I took no offense, I'm just trying to understand this shuffle algorithm. Specifically I don't understand why the branch test is in there. Ok, on with the math! I almost failed numerical analysis and its been years since I've opened a calc book, so I'll trust you on your calculation of averages. Actually, your math makes sense up till the bit about trapezoids. <grin> However, I do NOT believe that I miss read the Fisher-Yates algorithm... its exactly as you said, I just re-conceptualized it. Either way you look at it, the size of the range for `rand` is decremented one each time. This means that the range for the first pass at an array of size N is the same as the size of the range for the second pass of an array of size N+1. I think we're on the same page, I'm just off on the fringes. Which is ok. What this all boils down to is that the branch is not needed. Oh, As a side note, please don't tell me not think about it in terms of assembly... everything has to boil down to machine code at some point, and the easiest way to think in terms of machine code is assembly. I don't really care what path it takes to get there... C, perl op codes, whatever. The reality is that its all 1s and 0s. And every extra mucking about with them takes time. The trick to optimizing an algorithm like this is to determine the best way to minimize the extra electron traffic. What varies from language to language is how much control you have over that traffic. Perl doesn't give you much, but I am trying to learn where it does.	[reply]
RE: RE (tilly) 3: Fisher-Yates Shuffle by BlaisePascal (Monk) on Aug 30, 2000 at 07:58 UTC
So what you are saying (roughly) is that the branch takes O(n) time (it's tested n times) to save you O(log n) work on a task which would otherwise be O(n)? With branch: O(n) - O(log n), still O(n) No branch: O(n) - O(n) + O(log n), still O(n). In the long run, branch or no branch, the time is still dominated by n calls to the random function.	[reply]
RE (tilly) 5 (the point): Fisher-Yates Shuffle by tilly (Archbishop) on Aug 30, 2000 at 14:48 UTC
I was actually analyzing the win/loss of the branch. For large arrays, the overall running time is going to be O(n). No question. But will trying to skip useless swaps overall speed you up or slow you down? Well on large arrays you speed up on log(n)+O(1) times, and slow down all of the other iterations, so it is a loss. But doesn't change the overall big-O of the algorithm, just the constant.	[reply]
RE (tilly) 4: Fisher-Yates Shuffle by tilly (Archbishop) on Aug 30, 2000 at 07:51 UTC
Pride goeth before the fall. :-( I may be good at math, but I have not been doing math for a number of years, and after a while you start to get stupid. Like I was above. Everything that I said is true, except that the average number of collisions is actually 1/n + 1/n + ... + 1/n, and not 1+1/2+1/3+...+1/n. Rather important detail that. :-( So read the math, it is kind of fun except for that rather basic mistake. Then go looking up the hat-check problem. (/me feels like an idiot...) Which makes that branch make even less sense incidentally.	[reply]
Silly Tilly :-( by tilly (Archbishop) on Aug 30, 2000 at 14:45 UTC
Don't post late either. Posting math before bed is just as bad as posting code. The above "correction" applies to how many elements wind up where they started. The algorithm commentary applies to how many times you make a useless swap. Those are different things. sigh	[reply]
RE: Silly Tilly :-) by Adam (Vicar) on Aug 30, 2000 at 20:45 UTC
Don't worry about it, we all make mistakes. Besides, you really did have it right the first time, and your correction is the mistake. It just goes to show that you are usually right the first time.	[reply]
RE (tilly) 2: Silly Tilly :-) by tilly (Archbishop) on Aug 30, 2000 at 21:09 UTC