efoss has asked for the wisdom of the Perl Monks concerning the following question:

I want to randomly choose an item from a list of items, each of which has a different probability of being chosen. So essentially I have

A 10e-6

B 2x10e-7

C 10e-5

etc.

I want to reach in and choose an item randomly and to have the frequency with which I choose item A, B, C, etc. reflect the frequency that it is present. I can think of messy ways to do this, e.g. populate an array with items according to their frequencies and then randomly choose one item from the array. Can someone suggest a better approach? Thank you.
  • Comment on an algorithm to randomly pick items that are present at different frequencies

Replies are listed 'Best First'.
Re: an algorithm to randomly pick items that are present at different frequencies
by aaron_baugher (Curate) on May 22, 2015 at 01:55 UTC

    I'm going to assume you want something to get picked each time, so you want their probabilities to reflect their relative frequency values. In that case, in pseudo-code:

    loop through the lines parse out the key and the frequency save the key and frequency as a two-element array inside an array keep a running total of the frequencies set an accumulator to 0 loop through the array replace each frequency value with a value representing the frequency plus the accumulator, divided by the total of the frequencies add the original frequency to the accumulator get a random positive number less than 1 loop through the array one more time if the random number is less than the relative frequency value we've found our key, print it and exit the loop

    If that makes sense to you, try that and ask if you need help, showing the code you have so far.

    Aaron B.
    Available for small or large Perl jobs and *nix system administration; see my home node.

      Thanks very much, Aaron B. I got that to work - it's clever. Best wishes, Eric
Re: an algorithm to randomly pick items that are present at different frequencies
by BrowserUk (Patriarch) on May 22, 2015 at 01:08 UTC

    Your question is very unclear. Are you saying that you want to pick 'A': 1 in a million picks; 'B': 1 in every 20 million picks; and 'C': 1 in every 100,000 picks?

    If so, what do you want to pick the other 89,286 times out of every 100,000 picks?

    I think you need to clarify your question.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
    In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
      Hi BrowserUK, Sorry about the lack of clarity. I meant these to be relative frequencies, with something getting picked every time.

        Then something like this should do the trick:

        #! perl -slw use strict; use Data::Dump qw[ pp ]; sub genPicker { my $fh = shift; my( @vals, @odds ); ( $vals[ @vals ], $odds[ @odds ] ) = split( ' +' ) for <$fh>; ## Sort if not sorted my @order = sort{ $odds[ $a ] <=> $odds[ $b ] } 0 .. $#odds; @odds = @odds[ @order ]; @vals = @vals[ @order ]; ## Calculate and accumulate break points my $t = 0; $t += $_ for @odds; $_ /= $t for @odds; $odds[ $_ + 1 ] += $odds[ $_ ] for 0 .. $#odds - 1; ## Generate a subroutine to do the picking return sub { my $r = rand(); $r < $odds[ $_ ] and return $vals[ $_ ] for 0 .. $#odds; }; } my $pick = genPicker( *DATA ); ## run a quick test my %tally; ++$tally{ $pick->() } for 1 .. 10e6; pp \%tally; __DATA__ A 1e-7 B 20e-7 C 10e-5

        Produces:

        C:\test>1127420 { A => 9949, B => 195307, C => 9794744 } C:\test>1127420 { A => 10077, B => 196613, C => 9793310 }

        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
        In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked