in reply to Re^4: Generating 0 .. N Randomly and Efficiently
in thread Generating 0 .. N Randomly and Efficiently

You seem to have a basic misunderstanding.

If 0,4,1,2,3,5,9,6,7,8 has different odds of happening than 0,4,1,2,3,6,8,9,7,5 then by definition you aren't selecting a truly random permutation. It may be pretty random. It may be good enough for some purposes. But it isn't truly random.

With your algorithm there are correlations between where numbers appear. There shouldn't be.

About the bigger problem, if you explored this problem in detail, here is what you'd find. With the pure bit-string implementation you can reduce (in Perl) memory requirements by a factor of several hundred. In C you can reduce it by a factor of 30 or so. (Perl has more overhead in its data structures...) In either language the time to run goes like O(n*n). The algorithm that you have to use at each step is, At step i pick a random number j from 1 to (n-i), count off until you have j unselected bits, announce which number that was at and select that bit. The only optimization that you can use is to count from the large end if j > (n-i)/2.

Every way of improving that run time either comes at the cost of accepting a wrong answer, or using more memory. You've tried the track of making the answer wrong. An approach that uses more memory is to take the string and dividing it into blocks of 33 bytes. The first byte says how many unselected bits there are in the next 32 bytes. Then when you walk to find the j'th bit, you can just look at the first byte, the 34'th byte, the 67'th byte, etc until you narrow it down to a block of 256 bits that you can then scan. This is still O(n*n) runtime, but the performance improvement is so dramatic that it may be useable when the absolutely memory efficient version is not.

Adding memory only helps to a point. The shuffle solution that several people give is O(n) memory usage and O(n) runtime. Doing things that take more memory than that is counterproductive - the time taken to allocate and throw around memory has to be worse than what you can gain. If you're not running out of memory, that is the right solution to use. Period. (If you are running out of memory, I'd suggest implementing that one in C before doing something sophisticated in Perl.)

  • Comment on Re^5: Generating 0 .. N Randomly and Efficiently