Re: Generating 0 .. N Randomly and Efficiently
by bart (Canon) on Oct 19, 2004 at 15:57 UTC
|
I'd just generate an array containing the numbers 0 .. N, and shuffle it.
use List::Util 'shuffle';
my @random = shuffle 0 .. $N;
The problem with your approach is that not only the chances of misses rises as you have selected more items — it is not even garanteed to ever finish; but also, as you try to take the nearest item to your latest miss, I'm quite sure its randomness will not be that uniform, as items close to clusters of items that have been selected already, have a higher chance of being picked next. | [reply] [d/l] |
|
|
bart,
I'd just generate an array containing the numbers 0 .. N, and shuffle it.
As would I if I wasn't trying to learn new things by intentionally trying to avoid simple/easy solutions.
The problem with your approach is that not only the chances of misses rises as you have selected more items — it is not even garanteed to ever finish
In my approach, I minimized the hit for making a miss by just selecting the nearest unchosen bit. The direction to look for that bit is random. Additionally, if there are none available, it randomly selects an endpoint which is guaranteed to be unchosen. It is guaranteed to finish.
| [reply] |
|
|
I wonder... if your aim is to learn something, then why is your approach so much like most newbies' approaches to shuffling arrays? I mean, picking an item, and if it's already been picked, try again... Just see Google Groups for examples on the newsgroups. That just seems like a step backward to me...
Anyway, though you do take care to make sure every loop finds at least one item, you will not get a nice even distribution. Let me give an example.
Suppose that your $N is 20, and by some weird coincidence, the numbers 1 .. 10 have already been picked, and nothing else. For your next pick, the numbers 12 to 20 all have 1 chance in 21 of being picked. So do the numbers 0 and 11. But if the drawn number is between 1 and 10 (about 10 chances in 21, that is almost 50%), your system will promote it to either 0 or 11...
Net result: there's 12 chances in 21, that is over 50%, that the next number picked, will be either 0 or 11.
So yes, this way, you'll tend to get clustered results, instead of a nicely spread, speckled, outcome.
| [reply] |
|
|
|
|
The problem with your approach is that not only the chances of misses rises as you have selected more items — it is not even garanteed to ever finish;
Unless there's a bug in the implementation, the approach will finish. In fact, it's bounded by O(N2). Each iteration will find a new number - if the initial guess hit an already picked number, a scan for an unpicked number is made. And until all numbers have been picked, a scan will find an unpicked number.
Now, the approach won't be fair (due to clustering), and it won't be fast (quadratic vs. linear), and the implementation might contain bugs (and is huge for such a simple problem), the algorithm will finish (the implementation might not).
| [reply] |
Re: Generating 0 .. N Randomly and Efficiently
by blokhead (Monsignor) on Oct 19, 2004 at 17:31 UTC
|
With your algorithm, you won't get a truly random permutation of 0..N. Expanding on what ysth has said already, suppose N=9. If my random number generator happened to pick 0,4,1,2,3 as the initial segment of the permutation, then with probability 1/2 the next number in the permutation will be either 5 or 9, while the other 1/2 of the probability is shared among 6, 7, and 8. They are not all equally likely! In a truly random permutation, all yet-unselected numbers will be equally likely as being the next one to appear.
To get around this, don't search left and right if you hit a bit that's already been chosen. Just keep choosing and choosing randomly until you hit an unselected bit. With nonzero probability you will terminate ;) This is the only way to choose a permutation truly at random (other than filtering out (or bookkeeping) the unselected bits and randomly picking from them).
Another approach which still builds the entire list, but you may still appreciate because it needs only one call to rand is this: If you look at the swaps made in the Fisher-Yates shuffle, you can build a 1-to-1 mapping from the swaps you performed to the integers between 1 and N!. This is a ranking function, and it induces an ordering over the permutations. An unranking function just does the inverse, converts a number between 1 and N! into the swaps needed to build the permutation:
sub unrank {
my ($n, $rank) = @_;
my @perm = (0 .. $n);
my @swap;
for (reverse 2 .. $n) {
$swap[$_] = $rank % $_;
$rank = int($rank / $_);
}
for (2 .. $n) {
@perm[ $swap[$_], $_ ] = @perm[ $_, $swap[$_] ] if $swap[$_];
}
return @perm[1 .. $n];
}
In other words, this converts the uniform random distribution over the integers from 1 to N! into the uniform random distribution over permutations of N. Ranking/unranking functions is a standard way to select combinatorial structures uniformly at random using just one call to rand (compare to F-Y shuffle, which needs N calls to rand to return a random permutation). Usually you rank/unrank to the standard lexicographic ordering, but in the case of permutations, the ordering induced by the F-Y shuffle actually lets you rank and unrank faster.
You could try to unravel this unranking function so that it doesn't build the list in memory. In other words, given the rank r of a permutation in some ordering, and an index i, what is the i-th position of the r-th permutation? I'm not sure it's possible with this ordering, but perhaps in a lexicographic ranking/unranking scheme it would be.
For the record, I got a lot of this information about ranking and unranking permutations from the online book Combinatorial generation, which has lots of interesting algorithmic material relating to permutations, combinations, subsets, trees, partitions, etc. Specifically look at the section of chapter 4 relating to permutations (and exercise 23).
Update: I wrote this, ran off to class, and in the middle of lecture realized my unrank function was a bit off.. It's fixed now ;)
| [reply] [d/l] |
|
|
blokhead,
If my random number generator happened to pick 0,4,1,2,3 as the initial segment of the permutation, then with probability 1/2 the next number in the permutation will be either 5 or 9, while the other 1/2 of the probability is shared among 6, 7, and 8.
That is not actually the case. Remember that every time an endpoint is chosen, the endpoints move. The number being selected is only within the endpoints. In your example, all unchosen numbers would have the same probability of being selected.
To get around this, don't search left and right if you hit a bit that's already been chosen. Just keep choosing and choosing randomly until you hit an unselected bit. With nonzero probability you will terminate ;) This is the only way to choose a permutation truly at random
In my, admittedly twisted, mind - this is much closer to being random then you are giving it credit for.
- The endpoints automatically adjusts to keep from bouncing between 2 numbers
- The selection point itself is random
- The direction to choose from is random
As there are larger and larger holes, the two numbers on either sides of those holes have a higher chance of being selected, but it is still random (just not statistically uniform). In theory, the holes should grow at the same rate, though I wouldn't be willing to make that bet.
Thanks for the ideas and suggestions - I may check them out.
| [reply] |
|
|
In your example, all unchosen numbers would have the same probability of being selected.
Yes, that's my point ;)
...this is much closer to being random then you are giving it credit for...
The selection point itself is random
The direction to choose from is random
...still random (just not statistically uniform)...
<nitpick>
Then you need to qualify your use of the word "random." In the context of selecting things at "random," the word is meaningless without a probability distribution. If the distribution is not specified, then the only reasonable probability distribution to assume is the uniform distribution. It usually doesn't make sense to choose from a finite set of objects in a nonuniform way. It is for this reason that in CS we often say "choose at random" to implicitly mean "choose uniformly at random."
But please, do not use an unqualified "random" to mean "complicated to predict" or "generated by a process that uses the random number generator" or even "chosen according to some probability distribution that assigns positive probability to all outcomes." Yes there is a difference, and this is where confusions start.
Yes indeed, for many cases, your code would be "random enough," in that the probability of most permutations is not hugely different from 1/N!. But is this "random enough?"
sub random_perm {
my $n = shift;
return rand() < 0.2
? (1 .. $n)
: reverse (1 .. $n);
}
Well, it's random, but not uniform. It just happens to follow my personal favorite probability distribution ;) If there is a criteria for being "random enough", then you'd have to define what it means for a probability distribution to be "random enough" and then show convincingly that your code follows it!
Basically my point is that choosing something "randomly enough" or "close to randomly" or whatever is a meaningless expression, and the concept behind it is shaky.
</nitpick>
Anyway, I realize this is a little tangential from the topic at hand and I take full responsibility for jumping all over it. But I often see this term misused/abused and wanted to have my thoughts on public record ;) What were we talking about again? Oh yes, permutations using no memory ;)
| [reply] [d/l] |
|
|
|
|
Re: Generating 0 .. N Randomly and Efficiently
by ysth (Canon) on Oct 19, 2004 at 17:03 UTC
|
I have the feeling that searching forward or backward won't result in a well shuffled result; I'd think you'd tend to end up with some ranges of numbers clumping toward the end and some toward the beginning. Just a feeling, though...
One alternative is to do random selection (repeating until successful) for the first X percent of N, then switching to scanning through the whole bitmap until you find the rand(number-remaining)'th remaining number. Some benchmarks would be needed to pick a good X; I'd guess 90%. | [reply] |
|
|
ysth,
I haven't spent a lot of time analyzing the distribution frequency, but this implementation does have the possibility of clumping as both you and bart indicate. This, however, is only one of dozens of variations I tried over the last couple of days and others were better at being more statistically uniform.
I was hoping someone else would give it a try to see what they could come up with as there are far smarter monks here than myself. I know it seems silly. The answer without restrictions is obvious, assign to an array and shuffle. It just seems to be that there should be more efficient ways to reduce the search space, determine an unchosen bit randomly without a lot of work cycles, etc. It is obvious to me I need to learn more about bitstrings and bitwise operators in general.
| [reply] |
Re: Generating 0 .. N Randomly and Efficiently
by tmoertel (Chaplain) on Oct 20, 2004 at 01:20 UTC
|
You may wish to consider the following variant of your problem:
Generate a randomly permuted M-length subset of the
integers in the range 0..N.
What makes this variant especially interesting is that the trivial
solutions based upon shuffling 0..N are no longer viable
because of inefficiency. (Consider the case when M=5 and
N=2^48.)
In Jon Bentley's book More Programming Pearls: Confessions of a
Coder (ISBN 0201118890), he discusses this problem at length
and describes one of Knuth's solutions before arriving at the
following, beautiful algorithm by the late Robert W Floyd:
# The following algorithm is due to Bob Floyd and is
# described in Jon Bentley's, _More Programming Pearls_
# (1988), on p. 143.
#
# In this Perl implementation, I use a hash as an efficient,
# randomly-addressable series that supports insertion in
# O(1) time. This implementation selects subsets of size $m
# from the range [0,$n], without replacement. (The original
# from Bentley's book used the range [1,n]).
sub randomly_permuted_subset_of_range {
my ($m, $n) = @_;
my (@result, %series, $head);
for (my $j = $n-$m+1; $j <= $n; $j++) {
my $t = int rand($j+1);
if (exists $series{$t}) {
# insert $j into series after $t
@series{$t,$j} = ($j, $series{$t});
}
else {
# prefix $t to series
($head, $series{$t}) = ($t, $head);
}
}
for ($n = $head; defined $n; $n = $series{$n}) {
push @result, $n;
}
return \@result;
}
The algorithm runs in time O(M) and requires storage
of only M cells. (My implementation actually requires
2M cells because, for easy Data Dumping, I copy the
resulting sequence into an array before returning it, but this step
isn't required. BTW, I use the term "cell" to represent the
approximate storage required by a hash entry or an array element.)
Some examples:
use Data::Dumper;
$Data::Dumper::Indent = 0;
print Dumper( randomly_permuted_subset_of_range(5, 10) );
print Dumper( randomly_permuted_subset_of_range(11, 10) );
print Dumper( randomly_permuted_subset_of_range(5, 2**48) );
And the corresponding output:
$VAR1 = [2,1,7,0,5];
$VAR1 = [7,8,1,5,3,2,4,10,0,9,6];
$VAR1 = ['211572034652645','198222342410498',
'42862092043890','253205057797894',
'185109518737615'];
Note that when M=N+1, Floyd's algorithm solves your
original problem for any given N.
Cheers, Tom
| [reply] [d/l] [select] |
Re: Generating 0 .. N Randomly and Efficiently
by Random_Walk (Prior) on Oct 19, 2004 at 17:14 UTC
|
Here is a first shot, I get perl to allocate me a zero filled string of the correct size. calculate a chance step based on number of numbers left to find. Pick a random number then decrement this in countstep increments each time if find an unset bit in my string. once this is < 0 I look for the next unset bit (if I am not on one), set the bit and return its index. I can think of a couple of optimisations, moving the endpoints would be a help. for the early numbers I may partiton the space and dive into about the right subblock searching from there although this does risk reducing the randomness...
#!/usr/local/bin/perl
use strict;
my $number=1000;
my $string;
vec $string, $number, 1;
while ($number) {
my $chance_step = 1/$number;
my $rand=rand;
my $i=0;
while ($rand > 0) {
if (vec $string, $i++, 1) {
next
} else {
$rand-=$chance_step
}
}
while (vec $string, $i, 1){$i++}
vec($string, $i, 1)=1;
print "$i, ";
$number--
}
Cheers, R. | [reply] [d/l] |
|
|
Random_Walk,
This is very much like one of my first implementations. The trouble is that by using vec to determine if a bit has been selected, it will get slower and slower the closer to the end it gets. Try changing $number to 50_000 and see what happens the longer it runs.
| [reply] [d/l] |
|
|
The problem is not with vec, it is the way I start at the start of the string and count off non set bits till I reach $rand. The following timings are on a dev machine being used by a few so there is plenty of noise. The time to read the last bits of a long string using vec looks constant even with strings spanning several orders of magnitude.
#!/usr/local/bin/perl -w
use strict;
use Benchmark qw(timethese);
timethese(5000000, {
'1K' => vectest(1000),
'10K' => vectest(10000),
'100K' => vectest(100000),
'1M' => vectest(1000000),
'10M' => vectest(10000000),
});
sub vectest {
my $n=shift;
my $s="";
vec $s,$n,1;
my $look=vec $s,$n-10,1
}
__END__
Benchmark: timing 5000000 iterations of 100K, 10K, 10M, 1K, 1M...
100K: 0 wallclock secs ( 0.59 usr + 0.00 sys = 0.59 CPU)
10K: 1 wallclock secs ( 0.42 usr + 0.00 sys = 0.42 CPU)
10M: 1 wallclock secs ( 0.49 usr + 0.00 sys = 0.49 CPU)
1K: 1 wallclock secs ( 0.89 usr + 0.00 sys = 0.89 CPU)
1M: 1 wallclock secs ( 0.51 usr + 0.00 sys = 0.51 CPU)
I have a serious optimisation up my sleave but and rather too busy to code it now :( Hope to have a shot when I get home this evening
Cheers, R. | [reply] [d/l] |
Re: Generating 0 .. N Randomly and Efficiently
by Velaki (Chaplain) on Oct 19, 2004 at 16:05 UTC
|
This looks like shuffling a deck of cards. Wouldn't it be easier to create an array of the appropriate size, populate it, shuffle it, and then print the numbers?
#!/usr/bin/perl
use strict;
use warnings;
my $SIZE = 10;
my @a = (0..$SIZE);
for (@a) {
my $i = int(rand($SIZE+1));
($a[$i],$a[$_])=($a[$_],$a[$i])
}
print "$_\n" for @a;
Hope this helped,
-v
"Perl. There is no substitute."
| [reply] [d/l] |
|
|
| [reply] |
|
|
How does Velaki's solution use more memory (except maybe for (@a), which could be replaced with for ($j=0; $j<@a; $j++))?
Update: Ah yes, you return an interator instead of building an actual list.
| [reply] [d/l] [select] |
|
|
Re: Generating 0 .. N Randomly and Efficiently
by Random_Walk (Prior) on Oct 21, 2004 at 16:28 UTC
|
This was harder than I expected, I have been plagued by off by one errors and infinite loops but here is a version with a couple of optimations
- Moving start point up when start bits full
- moving endpoint down when it makes sense
- making a guess where to start the search for each number
It starts out nice and fast but bogs down for big sets looking for those last few unused values and I must admit it is pig ugly code. I think overall Limbic~Region's way of doing it is preferable.
#!/usr/local/bin/perl
use strict;
use integer;
$!++;
my $number=10000;
my $num_per_block=10;
my $string;
my $start=0;
my $end=$number-1;
vec $string, $end, 1;
while ($number) {
my $block_size=(($end-$start+1)/$number)*$num_per_block;
my $rand=int rand($number+1);
my $found=1; # optimist
my $guess_block=int($rand / $block_size);
my $i=$start+($guess_block*$block_size);
my $fiddled_rand=$rand-($guess_block*$num_per_block);
while ($fiddled_rand > -1) {
if (vec $string, $i, 1) {
if ($i==$end+1) {
# we ran off the end without getting a good number
$found=0;
last;
}
}else{
$fiddled_rand--;
}
$i++;
}
unless ($found) {
$i=$start;
while ($rand) {
unless (vec $string, $i++, 1) {
$rand--;
}
}
}
print "$i+";
vec($string, ($i-1), 1)=1;
while (vec $string, $start, 1){$start++}
while (vec $string, $end, 1){$end--}
$number--;
}
I have another cunning plan to prevent that horrible slowdown near the end but it is at the cost of more memory (step 1 is thrown this abomination away ;)
Cheers, R. | [reply] [d/l] |