in reply to Picking a random item through probability

I actually did something like this. I had an array,

my @prob = (0.1, 0.25, 0.03, 0.4, 0.22);
which described the probability of one of five (in this contrived case) options.

my @options = qw(monday tuesday wednesday thursday friday);

My suggestion is something like this:

#!/usr/bin/perl use strict; use warnings; my @likelihood = (1,3,4,9); my @option = qw(foo bar baz quux); my @sum; $sum[0] = $likelihood[0]; foreach my $n ( 1 .. $#likelihood){ $sum[$n] = $sum[$n - 1] + $likelihood[$n]; } # @sum = (1, 4, 8, 17); my $n = int(rand($sum[-1])+0.1); # random number <= 17 foreach my $i ( 0 .. $#sum) { if($n <= $sum[$i]){ print $option[$i] . "\n";; last; } }

NOTE:  This code is not tested.

emc

At that time [1909] the chief engineer was almost always the chief test pilot as well. That had the fortunate result of eliminating poor engineering early in aviation.

—Igor Sikorsky, reported in AOPA Pilot magazine February 2003.

Replies are listed 'Best First'.
Re^2: Picking a random item through probability
by Tanktalus (Canon) on Nov 24, 2006 at 03:26 UTC

    I've done this before, too, and I'd like to point out that the addition of 0.1 and the check of <= will throw off your probabilities. It may look right, and may even suffice for what you're doing, but it won't be the precise probabilities that you fed in at the top.

    Also, I'd like to discourage the idea of separating the weights from the options. It's far, far too easy to get the quantities out of sync. Perl makes anon hashes and arrays so easy that there's no excuse to do this.

    my @likelihood = (1, 3, 4, 9, 2, 8); my @option = qw(foo bar baz quux biz);
    Oops! Maybe if we lined it up better ...
    my @likelihood = (1, 3, 4, 9, 2, 8); my @option = qw(foo bar baz quux biz);
    Now it's obvious. But what if we have 50 items? It'll scroll off the right side of the screen and be practically impossible to deal with. Better to use an AoA or AoH instead.
    my @options = ( { name => 'foo', weight => 1 }, { name => 'bar', weight => 3 }, { name => 'baz', weight => 4 }, { name => 'quux', weight => 9 }, { name => 'biz', weight => 2 }, );
    Back to your probabilities ... remembering that rand(n) gives you a random number such that 0 < rand(n) < n, theoretically with even distribution, you're getting a random number r that is 0.1 < r < n + 0.1, then truncating it. That gives you a 0.9/17 (or about 5.3%) chance of getting 0, plus a 1/17 chance (or about 5.9%) of getting a 1, both of which (total of about 11.2%) will get you 'foo' in your case. Meanwhile, 'bar', with a weight of 3, has a 3/17 chance, or 17.6%, of being selected, which does not seem to be a triple weight compared to 'foo' as one would expect. Finally, there is only a 0.1/17 chance (or about 0.6%) of getting 17, plus the other 8 numbers that get you your 'quux', which is much less than 9/17 chance.

    Best, instead, to stick to int(rand($total_weight)) and comparing < instead of <=.

      Since today is a US holiday, and I did this for work, I've not got my source code available. Even if I could remember the exact code, it's not mine to put into a public place.

      The potential values and their corresponding probabilities (the sum of which was, of course, 1) were read from an output from some SAS analyses; neither was coded into the Perl program, so alignment wasn't an issue, and there were never more than about a half-dozen possible values.

      emc

      At that time [1909] the chief engineer was almost always the chief test pilot as well. That had the fortunate result of eliminating poor engineering early in aviation.

      —Igor Sikorsky, reported in AOPA Pilot magazine February 2003.