in reply to Fisher-Yates Shuffle

Cool!

But note: The larger the array, the more the branch will hurt you. On average the number of times you will get to break out before the swap when shuffling n elements is log(n)+O(1).

So while it is a loss for an array with 100,000 elements, that is a worst case for having the branch.

Replies are listed 'Best First'.
RE: RE (tilly) 1: Fisher-Yates Shuffle
by Adam (Vicar) on Aug 30, 2000 at 05:25 UTC
    Right. But big-O notation is concerned with worst case.

    But I think your log(n) time is wrong. Consider the fact that each pass through the loop is dealing with a smaller array. (Speaking of which, I ought to have used a pre-decrement instead of a post decrement... the last pass is a waste of time.) So the value of the branch decreases as the size of the array increases, right? Think about it this way: The odds of branching on the first pass through an array of size N are 1/N. The next passs through, the relative array size is N-1. So for any given array of size N the odds of branching at all are between 1 and 1/N * 1/(N-1) * 1/(N-2) and so on. That makes the high end, where every single loop branched, equal to 1/(N!). The worst case scenario of branching often gets pretty rare pretty quick. That branch gets pretty useless pretty fast, no?

    So what's the penalty of having the branch? Well the compiler can do one of two things with the assembly code. One, it guesses that the branch is common - and arranges the assembly to jump to a different memory block when the two indexes arn't equal. So you pay the price of doing that test and jumping pretty often... not good. OR The compiler guesses that the branch is rare... and has to test for equivalence on every pass and you paying the price. (This is more likely what Perl did, considering how close the benchmarks were.) Note that the test itself is simple. Its just a pass through an AND gate and maybe a NOT gate. (There is a third, even uglier, possability... that Perl actually uses a cache guess strategy where it guesses that if it branched last time then it will branch again... and vice-versa. This is a good strategy when the odds are closer to 50-50, but not in this case.)

    But if you don't introduce the branch at all, you pay a different penalty... which is why I posted this Meditation in the first place: Now you have removed the test for equivalence on every pass, and removed two lines of assembly code, but you are also trying to swap an array element with itself. If there were no penalty for that then the branch would never make sense, but I doubt this very much. So the question is... how severe is the penalty? Is it worse then checking if two 32 bit ints are equal? Probably. But not much. And since its more likely that you'll have to do the swap anyway... you might as well, right?

      You make one pass through the array. At each step you will make one swap. The idea is that at the first step you choose what element belongs at the end of the array. At the second step you choose the next to last element of the array out of what remained. And so on until you get to choosing the first element out of..gee..one element. :-)

      Now how many times will you swap an element for itself? On average the last element gets swapped 1/n times, the next one 1/(n-1), etc down to 1/1. That is:

      1 + 1/2 + 1/3 + ... + 1/n
      
      Which is, after suitable rearranging:
      1/2 + 1/2n +
           (1 + 1/2)/2 +
           (1/2 + 1/3)/2 +
           (1/3 + 1/4)/2 +
           .  .  .       +
           (1/(n-1) + 1/n)/2
      
      Aside from the pieces at the ends, this turns out to be a sum of local trapezoidal approximations to the integral from 1 to n of 1/x. The error terms turn out to converge to a constant, and the integral that you are approximating is log(n)-log(1)=log(n). (Natural logs, of course, are the only kind that most mathematicians care about.)

      Therefore on average the number of swaps you save is log(n) plus a complicated constant plus some small terms.

      Now this is on average. What will happen realistically? Well instead of adding up numbers we are adding up random variables. The calculation this time makes the previous one I sketched seem trivial. (Note, the variables are not independent.) But glory halleluja! The variance of the sum for any number of terms turns out to be bounded by a constant (and not a big one either) so in fact you can put absolute upper bound on the likelyhood it varies by more than, say, 10 from that average. So you can say that within (eg) a 99% confidence interval it will lie within some fixed distance of log(n), and that estimate will be true for n=10, n=1000, n=1,000,000, and n=10^100.

      Now how does this fit in big-O notation? Well first of all big-O notation as defined by Knuth doesn't really fit for random algorithms. Darn. (Easy enough to modify though.) Secondly people don't really use the word the same way that Knuth did. (eg Hash lookups as implemented in Perl are O(n). Everyone calls them O(1). Conversely O(n) algorithms are really O(n*n) as well, but we say, "That is O(n*n) so it is not a good algorithm.") Double darn. But hey, that is OK. After all big-O, little-o were invented by Bachmann in 1894 for number theory, not algorithms. (And were later popularized by Landau.) Language does change.

      Besides which, your test (averaging repeated runs) is going to test average behaviour, and not the outliers.

      Therefore whether or not you argue about the use of the phrase, "O(1)", my underlying comment is an accurate statement about what you are trying to measure. The benefit of the branch will be seen, on average, log(n) times plus some constant on an array of size n. Putting the logic in will cost you n times.

      An incidental note. Perl compiles everything down to portable op codes, not assembler, and runs those codes through an interpreter. Therefore thinking about how a C compiler would solve a problem is missing the point. perl simply does not deal with the physical machine at the same level that, say, gcc does. That if check is going to be dealt with as an interpreted statement.

      BTW take a look at my home node. I may have ranted a bit here, but that is because you wandered too close to what I am really good at. Namely math. :-)

      EDIT
      Good and out of practice. :-(

      When I put it on paper the random variables actually were independent with variances of (n-1)/(n*n), so the variance of the sum is again log(n)+O(1), and the standard deviation O(sqrt(log(n))). Therefore on a run you save:

      log(n) + O(sqrt(log(n)))
      
      iterations...
        First of all... don't worry about ranting. I took no offense, I'm just trying to understand this shuffle algorithm. Specifically I don't understand why the branch test is in there.

        Ok, on with the math! I almost failed numerical analysis and its been years since I've opened a calc book, so I'll trust you on your calculation of averages. Actually, your math makes sense up till the bit about trapezoids. <grin>

        However, I do NOT believe that I miss read the Fisher-Yates algorithm... its exactly as you said, I just re-conceptualized it. Either way you look at it, the size of the range for rand is decremented one each time. This means that the range for the first pass at an array of size N is the same as the size of the range for the second pass of an array of size N+1. I think we're on the same page, I'm just off on the fringes. Which is ok.

        What this all boils down to is that the branch is not needed.

        Oh,
        As a side note, please don't tell me not think about it in terms of assembly... everything has to boil down to machine code at some point, and the easiest way to think in terms of machine code is assembly. I don't really care what path it takes to get there... C, perl op codes, whatever. The reality is that its all 1s and 0s. And every extra mucking about with them takes time. The trick to optimizing an algorithm like this is to determine the best way to minimize the extra electron traffic. What varies from language to language is how much control you have over that traffic. Perl doesn't give you much, but I am trying to learn where it does.

        So what you are saying (roughly) is that the branch takes O(n) time (it's tested n times) to save you O(log n) work on a task which would otherwise be O(n)?

        With branch: O(n) - O(log n), still O(n)

        No branch: O(n) - O(n) + O(log n), still O(n).

        In the long run, branch or no branch, the time is still dominated by n calls to the random function.

        Pride goeth before the fall.

        :-(

        I may be good at math, but I have not been doing math for a number of years, and after a while you start to get stupid. Like I was above.

        Everything that I said is true, except that the average number of collisions is actually 1/n + 1/n + ... + 1/n, and not 1+1/2+1/3+...+1/n.

        Rather important detail that. :-(

        So read the math, it is kind of fun except for that rather basic mistake. Then go looking up the hat-check problem.

        (/me feels like an idiot...)

        Which makes that branch make even less sense incidentally.