Re: Biased random number selection

If you're choosing between n different values, it's very easy to make an order(n) routine to do this, as someone else has already shown.

But, you can do better. If you pre-build an array of partial frequency sums, then you have a sorted array, and you can do a binary search on it for the correct number. This would be faster, If n is relatively large, And if the frequency table doesn't change very much.

# Warning: Untested code, almost definitely has off by one errors!

# Given a frequency table in @freq, build a summed frequency table.
my $sum;
@sf = map {$sum += $_} @freq;

# Now we choose a random number in the range of total frequency sum
my $target = rand($sum);

# To find out what the Answer is (weighted freq from 0 to n),
# we now find n such that $sf[n-1]< $target < $sf[n]

# We keep track of the numbers we know $n is between
my $a = 0; my $b = @sf; my $n;

# repeat until we narrow down to one answer
while ($a < $b) {
    $n = int(($a + $b) / 2);
    if ($sf[$n] < $target) {
        $a = $n;
    } elsif ($sf[$n-1] > $target) {
        $b = $n;
    } else { 
        # $target < $sf[n] and $target > $sf[n-1],
        # so we have an answer.
        last;
    }
}

print "I found $n\n";
[download]

Like I said, this is not going to work as-is, but hopefully you get the idea. There is a greater overhead in the loop than doing a linear search, and a greater setup cost to build the array of summed frequencies. But the loop runs in order (log n) time instead of order (n) time, so for sufficiently large n, this can be a great improvement over a straight linear search (use benchmarking to make sure it's a Win).

If you have a known frequency distribution (ie the frequencies are all almost equal, or there are a lot of low frequencies and a few high ones), you may be able to get Even Faster results if you use some heuristics to choose a better method than "look halfway between my known upper and lower bounds for n." For example, if your frequencies are all pretty equal, you can get a better idea of a guess for $n by seeing how far away $sf[$n] is from $target, and moving $n an appropriate distance, instead of just going halfway.

Hope this helps. Does anyone know of a good binary search module?

Alan

Comment on Re: Biased random number selection Select or Download Code

Replies are listed 'Best First'.
RE: Re: Biased random number selection by athomason (Curate) on Jul 12, 2000 at 01:08 UTC
Very cool. The summed frequency array was one of the ideas I implemented, though I shot myself in the foot after that part. I was so determined to squeeze every ounce of speed out that I constructed an array of length `$sf[n]`, with `$freq[i]` elements with value `i` for each `i`. The lookup is O(1), which beats the pants off O(logn) and O(n). Unfortunately, my `n` actually is fairly large (~10,000), and I ran out of memory pretty quickly (even with 2GB/proc). The O(n) linear search over `@sf` was going to be my next attempt, but I'll certainly compare it to the binary search.	[reply] [d/l] [select]
RE: RE: Re: Biased random number selection by ferrency (Deacon) on Jul 12, 2000 at 01:37 UTC
Yep, overall it's a typical speed/space tradeoff. Another benefit of the linear or binary traversal solutions is, your frequencies needn't be integers (well, assuming rand() works correctly with non-integers, that is).	[reply]