That has the advantage of just one rand().
Things you have to look out for, though:
the total weight not being 1.0 (or some other "well known" value).
any of the weights being very, very much smaller than the other weights -- or anything else that will cause problems with floating point subtraction.
the final value $r being greater than the weight of the last item (due to rounding).
BTW, I think the comparison should be $r < 0.
One could stick the weight at the end of the list:
use strict ; use warnings ;
my $RUNS = 100_000 ;
my $data = [[a => 0.1], [b => 0.5], [c => 0.4], [c => 0]] ;
my %count;
foreach (1 .. $RUNS) {
my $total_weight = $data->[-1]->[1] ;
if (!$total_weight) {
$total_weight += $_->[1] for @$data ;
$data->[-1]->[1] = $total_weight ;
} ;
my $pick;
my $r = rand($total_weight) ;
foreach (@$data) { if (($r -= $_->[1]) < 0) { $pick = $_->[0] ; la
+st ; } ; } ;
$count{$pick}++ ;
} ;
while (my ($k, $v) = each %count) {
printf "%s: %5.2f%%\n", $k, 100 * $v / $RUNS ;
} ;
__END__
c: 39.89%
a: 9.92%
b: 50.19%
Noting that we can duplicate the last value in the total weight entry, so in the (unlikely) event of not stopping on the last real entry, the last value will come from the weight entry.
This is, like the other solutions, a linear search along your list of possible values. You can speed this up by putting the popular stuff earlier in the list. If that is still too slow -- you have lots of possible values and you're doing this a lot, you can sub divide the list into groups, and prefix each group with its total weight -- then step along the groups reducing $r as above, and then within the selected group increasing $r.... but only if you really have to !
|