in reply to Re: Generating 0 .. N Randomly and Efficiently
in thread Generating 0 .. N Randomly and Efficiently

blokhead,
If my random number generator happened to pick 0,4,1,2,3 as the initial segment of the permutation, then with probability 1/2 the next number in the permutation will be either 5 or 9, while the other 1/2 of the probability is shared among 6, 7, and 8.

That is not actually the case. Remember that every time an endpoint is chosen, the endpoints move. The number being selected is only within the endpoints. In your example, all unchosen numbers would have the same probability of being selected.

To get around this, don't search left and right if you hit a bit that's already been chosen. Just keep choosing and choosing randomly until you hit an unselected bit. With nonzero probability you will terminate ;) This is the only way to choose a permutation truly at random

In my, admittedly twisted, mind - this is much closer to being random then you are giving it credit for.

As there are larger and larger holes, the two numbers on either sides of those holes have a higher chance of being selected, but it is still random (just not statistically uniform). In theory, the holes should grow at the same rate, though I wouldn't be willing to make that bet.

Thanks for the ideas and suggestions - I may check them out.

Cheers - L~R

  • Comment on Re^2: Generating 0 .. N Randomly and Efficiently

Replies are listed 'Best First'.
Re^3: Generating 0 .. N Randomly and Efficiently
by blokhead (Monsignor) on Oct 19, 2004 at 19:40 UTC
    In your example, all unchosen numbers would have the same probability of being selected.
    Yes, that's my point ;)
    ...this is much closer to being random then you are giving it credit for...

    The selection point itself is random
    The direction to choose from is random

    ...still random (just not statistically uniform)...

    <nitpick>

    Then you need to qualify your use of the word "random." In the context of selecting things at "random," the word is meaningless without a probability distribution. If the distribution is not specified, then the only reasonable probability distribution to assume is the uniform distribution. It usually doesn't make sense to choose from a finite set of objects in a nonuniform way. It is for this reason that in CS we often say "choose at random" to implicitly mean "choose uniformly at random."

    But please, do not use an unqualified "random" to mean "complicated to predict" or "generated by a process that uses the random number generator" or even "chosen according to some probability distribution that assigns positive probability to all outcomes." Yes there is a difference, and this is where confusions start.

    Yes indeed, for many cases, your code would be "random enough," in that the probability of most permutations is not hugely different from 1/N!. But is this "random enough?"

    sub random_perm { my $n = shift; return rand() < 0.2 ? (1 .. $n) : reverse (1 .. $n); }
    Well, it's random, but not uniform. It just happens to follow my personal favorite probability distribution ;) If there is a criteria for being "random enough", then you'd have to define what it means for a probability distribution to be "random enough" and then show convincingly that your code follows it!

    Basically my point is that choosing something "randomly enough" or "close to randomly" or whatever is a meaningless expression, and the concept behind it is shaky.

    </nitpick>

    Anyway, I realize this is a little tangential from the topic at hand and I take full responsibility for jumping all over it. But I often see this term misused/abused and wanted to have my thoughts on public record ;) What were we talking about again? Oh yes, permutations using no memory ;)

    blokhead

      blokhead,
      Ok, color me confused. First you say:

      If my random number generator happened to pick 0,4,1,2,3 as the initial segment of the permutation, then with probability 1/2 the next number in the permutation will be either 5 or 9, while the other 1/2 of the probability is shared among 6, 7, and 8. They are not all equally likely! In a truly random permutation, all yet-unselected numbers will be equally likely as being the next one to appear. Emphasis mine.

      Then I point out that, no, in my code and that situation all unchosen numbers would indeed have the same probability of being chosen. And then you say:

      Yes, that's my point ;)

      Um - which is it?

      With regards to probability distribution: My solution is not auto-correcting. It starts out assuming uniform probability distribution but does not correct for it if it randomly goes awry. The "holes" should grow uniformly at the same rate since initial selection is random, direction to look is random, and end points move to keep the pool balanced on the ends. The trouble is that not all holes do form evenly. I can't prove that this is no different than a random permutation but I haven't seen any proof that it isn't either.

      Cheers - L~R

        You seem to have a basic misunderstanding.

        If 0,4,1,2,3,5,9,6,7,8 has different odds of happening than 0,4,1,2,3,6,8,9,7,5 then by definition you aren't selecting a truly random permutation. It may be pretty random. It may be good enough for some purposes. But it isn't truly random.

        With your algorithm there are correlations between where numbers appear. There shouldn't be.

        About the bigger problem, if you explored this problem in detail, here is what you'd find. With the pure bit-string implementation you can reduce (in Perl) memory requirements by a factor of several hundred. In C you can reduce it by a factor of 30 or so. (Perl has more overhead in its data structures...) In either language the time to run goes like O(n*n). The algorithm that you have to use at each step is, At step i pick a random number j from 1 to (n-i), count off until you have j unselected bits, announce which number that was at and select that bit. The only optimization that you can use is to count from the large end if j > (n-i)/2.

        Every way of improving that run time either comes at the cost of accepting a wrong answer, or using more memory. You've tried the track of making the answer wrong. An approach that uses more memory is to take the string and dividing it into blocks of 33 bytes. The first byte says how many unselected bits there are in the next 32 bytes. Then when you walk to find the j'th bit, you can just look at the first byte, the 34'th byte, the 67'th byte, etc until you narrow it down to a block of 256 bits that you can then scan. This is still O(n*n) runtime, but the performance improvement is so dramatic that it may be useable when the absolutely memory efficient version is not.

        Adding memory only helps to a point. The shuffle solution that several people give is O(n) memory usage and O(n) runtime. Doing things that take more memory than that is counterproductive - the time taken to allocate and throw around memory has to be worse than what you can gain. If you're not running out of memory, that is the right solution to use. Period. (If you are running out of memory, I'd suggest implementing that one in C before doing something sophisticated in Perl.)