m_al has asked for the wisdom of the Perl Monks concerning the following question:

Given a data structure similar to:
{ a => 0.1, b => 0.5, c => 0.4, }
How can I pick a random element, using the values as weights? I'm going for a more generic solution than applies to this specific hash type, since currently I actually have a matrix like
[ [a, 0.1], [b, 0.5] ... ]
(but I'm going to change it because it's crazy). TIA al

Replies are listed 'Best First'.
Re: How do I get a random index with weighting?
by kyle (Abbot) on Nov 24, 2008 at 21:55 UTC
Re: How do I get a random index with weighting?
by JavaFan (Canon) on Nov 24, 2008 at 23:20 UTC
    Classical problem. Knuth discusses it in TAOCP. One way of doing it:
    my $RUNS = 100_000; my $data = [[a => 0.1], [b => 0.5], [c => 0.4]]; my %count; foreach (1 .. $RUNS) { my $pick; my $w = 0; rand($w += $$_[1]) < $$_[1] and $pick = $$_[0] for @$data; $count{$pick}++; } while (my ($k, $v) = each %count) { printf "%s: %5.2f%%\n", $k, 100 * $v / $RUNS; } __END__ c: 39.88% a: 9.93% b: 50.19%
Re: How do I get a random index with weighting?
by bart (Canon) on Nov 24, 2008 at 23:35 UTC
    I developed a formula to randomly pick N items based on their weights (for what you ask for, N == 1) that I could use on an array and even in SQL, that you can find here. The basic idea is to sort the items based on a calculated random value from the expression
    -log(1 - rand)/$weight
    where $weight is the weight factor for the current item. You can find the rationale behind the formula in that post.

    For one item, you can reduce the code to picking the item where this expression has the lowest value. There's no need to actually sort them all.

    Here is code for the reduced problem:

    my @matrix = ( [a => 0.1], [b => 0.5], [c => 0.4] ); my $chosen = pick(@matrix); sub pick { my($choice, $threshold); foreach(@_) { my $rand = -log(1 - rand)/$_->[1]; if(not defined $threshold or $rand < $threshold) { $choice = $_->[0]; $threshold = $rand; } } return $choice; }
    For better randomness of a huge number of draws, it's better to pick a random number generator that has a higher number of bits (and a larger cycle) than the default one (only 15 bits on ActivePerl/Windows!), for example Math::Random::MT which can be used as a plug in replacement if you import (and override) rand.
    use Math::Random::MT qw(rand);

    To demonstrate that it works as advertised, here are the results of a test run of 100_000 draws:

    my %picked; for(1 .. 1E5) { $picked{pick(@matrix)}++; } use Data::Dumper; print Dumper \%picked;
    Output:
    $VAR1 = { 'c' => 40200, 'a' => 10042, 'b' => 49758 };
    which looks very close to the desired distribution, to me.

    p.s. If there can be any items with weight 0, thus, that should never be picked, you will have to skip them at the top of the loop, because otherwise, the expression will barf (division by zero).

Re: How do I get a random index with weighting?
by perreal (Monk) on Nov 25, 2008 at 00:00 UTC
    something like the code below should be alright:
    my $h = { a => 0.1, b => 0.5, c => 0.4, }; for (1..10000) { $s{rand_key($h)}++; } print "output: ".join(" ", %s),"\n"; sub rand_key { my $h = shift; my $r = rand(); for (keys %$h) { $r-=$h->{$_}; return $_ if($r < 0); } return undef; } __END__ output: c 4024 a 1017 b 4959
    edit: thanks oshalla, corrected the comparison

      That has the advantage of just one rand().

      Things you have to look out for, though:

      • the total weight not being 1.0 (or some other "well known" value).

      • any of the weights being very, very much smaller than the other weights -- or anything else that will cause problems with floating point subtraction.

      • the final value $r being greater than the weight of the last item (due to rounding).

      BTW, I think the comparison should be $r < 0.

      One could stick the weight at the end of the list:

      use strict ; use warnings ; my $RUNS = 100_000 ; my $data = [[a => 0.1], [b => 0.5], [c => 0.4], [c => 0]] ; my %count; foreach (1 .. $RUNS) { my $total_weight = $data->[-1]->[1] ; if (!$total_weight) { $total_weight += $_->[1] for @$data ; $data->[-1]->[1] = $total_weight ; } ; my $pick; my $r = rand($total_weight) ; foreach (@$data) { if (($r -= $_->[1]) < 0) { $pick = $_->[0] ; la +st ; } ; } ; $count{$pick}++ ; } ; while (my ($k, $v) = each %count) { printf "%s: %5.2f%%\n", $k, 100 * $v / $RUNS ; } ; __END__ c: 39.89% a: 9.92% b: 50.19%

      Noting that we can duplicate the last value in the total weight entry, so in the (unlikely) event of not stopping on the last real entry, the last value will come from the weight entry.

      This is, like the other solutions, a linear search along your list of possible values. You can speed this up by putting the popular stuff earlier in the list. If that is still too slow -- you have lots of possible values and you're doing this a lot, you can sub divide the list into groups, and prefix each group with its total weight -- then step along the groups reducing $r as above, and then within the selected group increasing $r.... but only if you really have to !

Re: How do I get a random index with weighting?
by salva (Canon) on Nov 25, 2008 at 09:34 UTC
    If the probabilities precision is limited to 2 digits as for your sample data:
    my %odds = (a => 0.1, b => 0.5, c => 0.4 ); my @dize = map { ($_) x int(100 * $odds{$_}) } keys %odds; for (1..1000) { print $dize[rand @dize], "\n"; }
      I like the simplicity of this, and since the probabilities in my case are certain to add up to 1 (because I normalised them) then a 100-long array might be the simplest way.

      I have, however, learned a fair amount from everyone else's replies! Thanks all.