in reply to Re^2: Generating 0 .. N Randomly and Efficiently
in thread Generating 0 .. N Randomly and Efficiently

I wonder... if your aim is to learn something, then why is your approach so much like most newbies' approaches to shuffling arrays? I mean, picking an item, and if it's already been picked, try again... Just see Google Groups for examples on the newsgroups. That just seems like a step backward to me...

Anyway, though you do take care to make sure every loop finds at least one item, you will not get a nice even distribution. Let me give an example.

Suppose that your $N is 20, and by some weird coincidence, the numbers 1 .. 10 have already been picked, and nothing else. For your next pick, the numbers 12 to 20 all have 1 chance in 21 of being picked. So do the numbers 0 and 11. But if the drawn number is between 1 and 10 (about 10 chances in 21, that is almost 50%), your system will promote it to either 0 or 11...

Net result: there's 12 chances in 21, that is over 50%, that the next number picked, will be either 0 or 11.

So yes, this way, you'll tend to get clustered results, instead of a nicely spread, speckled, outcome.

  • Comment on Re^3: Generating 0 .. N Randomly and Efficiently

Replies are listed 'Best First'.
Re^4: Generating 0 .. N Randomly and Efficiently
by Limbic~Region (Chancellor) on Oct 19, 2004 at 18:56 UTC
    bart,
    I mean, picking an item, and if it's already been picked, try again

    That is not what this code does. It automatically defaults to finding the nearest unselected bit. It does this using a regex to find the first byte that is not all ones and then examining that byte bit by bit.

    Anyway, though you do take care to make sure every loop finds at least one item, you will not get a nice even distribution.

    While I didn't say statistically uniform distribution was a necessity, it is a nicetey.

    Suppose that your $N is 20, and by some weird coincidence, the numbers 1 .. 10 have already been picked, and nothing else. For your next pick, the numbers 12 to 20 all have 1 chance in 21 of being picked.

    This is just not true. The random number is not selected over the entire range but only between 2 floating end points. Those end points move everytime they are selected.

    Net result: there's 12 chances in 21, that is over 50%, that the next number picked, will be either 0 or 11.

    This is also not true. Given the way the code works in your example, all unchosen bits have the same probability of being chosen.

    As I have indicated in other replies, uneven statistical distribution is still possible even though:

    • The end points automatically change
    • The initial guess between the end points is random
    • The direction to look after a miss is random
    This is because as holes get larger, the probability one of the two numbers on either side will be chosen increases. Theoretically those holes should all grow at the same rate, but I haven't taken the time to analyze it.

    Cheers - L~R