in reply to How do I get a random index with weighting?

something like the code below should be alright:
my $h = { a => 0.1, b => 0.5, c => 0.4, }; for (1..10000) { $s{rand_key($h)}++; } print "output: ".join(" ", %s),"\n"; sub rand_key { my $h = shift; my $r = rand(); for (keys %$h) { $r-=$h->{$_}; return $_ if($r < 0); } return undef; } __END__ output: c 4024 a 1017 b 4959
edit: thanks oshalla, corrected the comparison

Replies are listed 'Best First'.
Re^2: How do I get a random index with weighting?
by gone2015 (Deacon) on Nov 25, 2008 at 01:03 UTC

    That has the advantage of just one rand().

    Things you have to look out for, though:

    • the total weight not being 1.0 (or some other "well known" value).

    • any of the weights being very, very much smaller than the other weights -- or anything else that will cause problems with floating point subtraction.

    • the final value $r being greater than the weight of the last item (due to rounding).

    BTW, I think the comparison should be $r < 0.

    One could stick the weight at the end of the list:

    use strict ; use warnings ; my $RUNS = 100_000 ; my $data = [[a => 0.1], [b => 0.5], [c => 0.4], [c => 0]] ; my %count; foreach (1 .. $RUNS) { my $total_weight = $data->[-1]->[1] ; if (!$total_weight) { $total_weight += $_->[1] for @$data ; $data->[-1]->[1] = $total_weight ; } ; my $pick; my $r = rand($total_weight) ; foreach (@$data) { if (($r -= $_->[1]) < 0) { $pick = $_->[0] ; la +st ; } ; } ; $count{$pick}++ ; } ; while (my ($k, $v) = each %count) { printf "%s: %5.2f%%\n", $k, 100 * $v / $RUNS ; } ; __END__ c: 39.89% a: 9.92% b: 50.19%

    Noting that we can duplicate the last value in the total weight entry, so in the (unlikely) event of not stopping on the last real entry, the last value will come from the weight entry.

    This is, like the other solutions, a linear search along your list of possible values. You can speed this up by putting the popular stuff earlier in the list. If that is still too slow -- you have lots of possible values and you're doing this a lot, you can sub divide the list into groups, and prefix each group with its total weight -- then step along the groups reducing $r as above, and then within the selected group increasing $r.... but only if you really have to !