in reply to an algorithm to randomly pick items that are present at different frequencies

Your question is very unclear. Are you saying that you want to pick 'A': 1 in a million picks; 'B': 1 in every 20 million picks; and 'C': 1 in every 100,000 picks?

If so, what do you want to pick the other 89,286 times out of every 100,000 picks?

I think you need to clarify your question.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
  • Comment on Re: an algorithm to randomly pick items that are present at different frequencies

Replies are listed 'Best First'.
Re^2: an algorithm to randomly pick items that are present at different frequencies
by efoss (Acolyte) on May 22, 2015 at 03:26 UTC
    Hi BrowserUK, Sorry about the lack of clarity. I meant these to be relative frequencies, with something getting picked every time.

      Then something like this should do the trick:

      #! perl -slw use strict; use Data::Dump qw[ pp ]; sub genPicker { my $fh = shift; my( @vals, @odds ); ( $vals[ @vals ], $odds[ @odds ] ) = split( ' +' ) for <$fh>; ## Sort if not sorted my @order = sort{ $odds[ $a ] <=> $odds[ $b ] } 0 .. $#odds; @odds = @odds[ @order ]; @vals = @vals[ @order ]; ## Calculate and accumulate break points my $t = 0; $t += $_ for @odds; $_ /= $t for @odds; $odds[ $_ + 1 ] += $odds[ $_ ] for 0 .. $#odds - 1; ## Generate a subroutine to do the picking return sub { my $r = rand(); $r < $odds[ $_ ] and return $vals[ $_ ] for 0 .. $#odds; }; } my $pick = genPicker( *DATA ); ## run a quick test my %tally; ++$tally{ $pick->() } for 1 .. 10e6; pp \%tally; __DATA__ A 1e-7 B 20e-7 C 10e-5

      Produces:

      C:\test>1127420 { A => 9949, B => 195307, C => 9794744 } C:\test>1127420 { A => 10077, B => 196613, C => 9793310 }

      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
      In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked

        Hi BrowserUk,

        Thanks very much for this. Unfortunately, this syntax is above my head, though I would like to understand it. For starters, what exactly am I passing to "genPicker"? A file handle? A file name? I made a space-delimited file with A, B and C in the first column and the frequencies in the second column and tried in various ways to pass that into the subroutine but without success.

        my $pick = genPicker( *DATA );

        What is "*DATA" here? I don't know how to get the "__DATA__" that you list near the bottom into *DATA form. Any help would be much appreciated.

        Best wishes, Eric