in reply to Generating 0 .. N Randomly and Efficiently

With your algorithm, you won't get a truly random permutation of 0..N. Expanding on what ysth has said already, suppose N=9. If my random number generator happened to pick 0,4,1,2,3 as the initial segment of the permutation, then with probability 1/2 the next number in the permutation will be either 5 or 9, while the other 1/2 of the probability is shared among 6, 7, and 8. They are not all equally likely! In a truly random permutation, all yet-unselected numbers will be equally likely as being the next one to appear.

To get around this, don't search left and right if you hit a bit that's already been chosen. Just keep choosing and choosing randomly until you hit an unselected bit. With nonzero probability you will terminate ;) This is the only way to choose a permutation truly at random (other than filtering out (or bookkeeping) the unselected bits and randomly picking from them).

Another approach which still builds the entire list, but you may still appreciate because it needs only one call to rand is this: If you look at the swaps made in the Fisher-Yates shuffle, you can build a 1-to-1 mapping from the swaps you performed to the integers between 1 and N!. This is a ranking function, and it induces an ordering over the permutations. An unranking function just does the inverse, converts a number between 1 and N! into the swaps needed to build the permutation:

sub unrank { my ($n, $rank) = @_; my @perm = (0 .. $n); my @swap; for (reverse 2 .. $n) { $swap[$_] = $rank % $_; $rank = int($rank / $_); } for (2 .. $n) { @perm[ $swap[$_], $_ ] = @perm[ $_, $swap[$_] ] if $swap[$_]; } return @perm[1 .. $n]; }
In other words, this converts the uniform random distribution over the integers from 1 to N! into the uniform random distribution over permutations of N. Ranking/unranking functions is a standard way to select combinatorial structures uniformly at random using just one call to rand (compare to F-Y shuffle, which needs N calls to rand to return a random permutation). Usually you rank/unrank to the standard lexicographic ordering, but in the case of permutations, the ordering induced by the F-Y shuffle actually lets you rank and unrank faster.

You could try to unravel this unranking function so that it doesn't build the list in memory. In other words, given the rank r of a permutation in some ordering, and an index i, what is the i-th position of the r-th permutation? I'm not sure it's possible with this ordering, but perhaps in a lexicographic ranking/unranking scheme it would be.

For the record, I got a lot of this information about ranking and unranking permutations from the online book Combinatorial generation, which has lots of interesting algorithmic material relating to permutations, combinations, subsets, trees, partitions, etc. Specifically look at the section of chapter 4 relating to permutations (and exercise 23).

Update: I wrote this, ran off to class, and in the middle of lecture realized my unrank function was a bit off.. It's fixed now ;)

blokhead

Replies are listed 'Best First'.
Re^2: Generating 0 .. N Randomly and Efficiently
by Limbic~Region (Chancellor) on Oct 19, 2004 at 17:47 UTC
    blokhead,
    If my random number generator happened to pick 0,4,1,2,3 as the initial segment of the permutation, then with probability 1/2 the next number in the permutation will be either 5 or 9, while the other 1/2 of the probability is shared among 6, 7, and 8.

    That is not actually the case. Remember that every time an endpoint is chosen, the endpoints move. The number being selected is only within the endpoints. In your example, all unchosen numbers would have the same probability of being selected.

    To get around this, don't search left and right if you hit a bit that's already been chosen. Just keep choosing and choosing randomly until you hit an unselected bit. With nonzero probability you will terminate ;) This is the only way to choose a permutation truly at random

    In my, admittedly twisted, mind - this is much closer to being random then you are giving it credit for.

    • The endpoints automatically adjusts to keep from bouncing between 2 numbers
    • The selection point itself is random
    • The direction to choose from is random

    As there are larger and larger holes, the two numbers on either sides of those holes have a higher chance of being selected, but it is still random (just not statistically uniform). In theory, the holes should grow at the same rate, though I wouldn't be willing to make that bet.

    Thanks for the ideas and suggestions - I may check them out.

    Cheers - L~R

      In your example, all unchosen numbers would have the same probability of being selected.
      Yes, that's my point ;)
      ...this is much closer to being random then you are giving it credit for...

      The selection point itself is random
      The direction to choose from is random

      ...still random (just not statistically uniform)...

      <nitpick>

      Then you need to qualify your use of the word "random." In the context of selecting things at "random," the word is meaningless without a probability distribution. If the distribution is not specified, then the only reasonable probability distribution to assume is the uniform distribution. It usually doesn't make sense to choose from a finite set of objects in a nonuniform way. It is for this reason that in CS we often say "choose at random" to implicitly mean "choose uniformly at random."

      But please, do not use an unqualified "random" to mean "complicated to predict" or "generated by a process that uses the random number generator" or even "chosen according to some probability distribution that assigns positive probability to all outcomes." Yes there is a difference, and this is where confusions start.

      Yes indeed, for many cases, your code would be "random enough," in that the probability of most permutations is not hugely different from 1/N!. But is this "random enough?"

      sub random_perm { my $n = shift; return rand() < 0.2 ? (1 .. $n) : reverse (1 .. $n); }
      Well, it's random, but not uniform. It just happens to follow my personal favorite probability distribution ;) If there is a criteria for being "random enough", then you'd have to define what it means for a probability distribution to be "random enough" and then show convincingly that your code follows it!

      Basically my point is that choosing something "randomly enough" or "close to randomly" or whatever is a meaningless expression, and the concept behind it is shaky.

      </nitpick>

      Anyway, I realize this is a little tangential from the topic at hand and I take full responsibility for jumping all over it. But I often see this term misused/abused and wanted to have my thoughts on public record ;) What were we talking about again? Oh yes, permutations using no memory ;)

      blokhead

        blokhead,
        Ok, color me confused. First you say:

        If my random number generator happened to pick 0,4,1,2,3 as the initial segment of the permutation, then with probability 1/2 the next number in the permutation will be either 5 or 9, while the other 1/2 of the probability is shared among 6, 7, and 8. They are not all equally likely! In a truly random permutation, all yet-unselected numbers will be equally likely as being the next one to appear. Emphasis mine.

        Then I point out that, no, in my code and that situation all unchosen numbers would indeed have the same probability of being chosen. And then you say:

        Yes, that's my point ;)

        Um - which is it?

        With regards to probability distribution: My solution is not auto-correcting. It starts out assuming uniform probability distribution but does not correct for it if it randomly goes awry. The "holes" should grow uniformly at the same rate since initial selection is random, direction to look is random, and end points move to keep the pool balanced on the ends. The trouble is that not all holes do form evenly. I can't prove that this is no different than a random permutation but I haven't seen any proof that it isn't either.

        Cheers - L~R