Re^3: an algorithm to randomly pick items that are present at different frequencies

Then something like this should do the trick:

#! perl -slw
use strict;
use Data::Dump qw[ pp ];

sub genPicker {
    my $fh = shift;
    my( @vals, @odds ); ( $vals[ @vals ], $odds[ @odds ] ) = split( ' 
+' ) for <$fh>;

    ## Sort if not sorted
    my @order = sort{ $odds[ $a ] <=> $odds[ $b ] } 0 .. $#odds;
    @odds = @odds[ @order ];
    @vals = @vals[ @order ];

    ## Calculate and accumulate break points
    my $t = 0; $t += $_ for @odds;
    $_ /= $t for @odds;
    $odds[ $_ + 1 ] += $odds[ $_ ] for 0 .. $#odds - 1;

    ## Generate a subroutine to do the picking
    return sub {
        my $r = rand();
        $r < $odds[ $_ ] and return $vals[ $_ ] for 0 .. $#odds;
    };
}

my $pick = genPicker( *DATA );

## run a quick test
my %tally; ++$tally{ $pick->() } for 1 .. 10e6;

pp \%tally;

__DATA__
A 1e-7
B 20e-7
C 10e-5
[download]

Produces:

C:\test>1127420
{ A => 9949, B => 195307, C => 9794744 }

C:\test>1127420
{ A => 10077, B => 196613, C => 9793310 }
[download]

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority". I'm with torvalds on this

In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked

Comment on Re^3: an algorithm to randomly pick items that are present at different frequencies Select or Download Code

Replies are listed 'Best First'.
Re^4: an algorithm to randomly pick items that are present at different frequencies by efoss (Acolyte) on May 27, 2015 at 05:40 UTC
Hi BrowserUk, Thanks very much for this. Unfortunately, this syntax is above my head, though I would like to understand it. For starters, what exactly am I passing to "genPicker"? A file handle? A file name? I made a space-delimited file with A, B and C in the first column and the frequencies in the second column and tried in various ways to pass that into the subroutine but without success. `my $pick = genPicker( DATA );` What is "DATA" here? I don't know how to get the "__DATA__" that you list near the bottom into *DATA form. Any help would be much appreciated. Best wishes, Eric	[reply] [d/l]
Re^5: an algorithm to randomly pick items that are present at different frequencies by BrowserUk (Patriarch) on May 27, 2015 at 05:46 UTC
what exactly am I passing to "genPicker"? A file handle? You answered your own question :) Yes, its a file handle. `*DATA` is a (pseudo)filehandle that allows access the 'file' after __DATA__. So,if you had your input in a file called numbs.dat, you do this: `... open my $fh, '<', 'numbs.dat' or die $!; my $pick = genPicker( $fh ); ## Reads the file and generates a picker +subroutine according to its contents. close $fh; ### use $pick->() each time you want a new random number.` [download] With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked	[reply] [d/l] [select]
Re^6: an algorithm to randomly pick items that are present at different frequencies by efoss (Acolyte) on May 27, 2015 at 19:01 UTC
Hi BrowserUk, I'm still running into trouble. Here is my code: #!/usr/bin/perl -w use strict; use Data::Dump qw[ pp ]; open my $fh, '<', 'numbs.dat' or die $!; my $pick = genPicker( $fh ); ## Reads the file and generates a picker sub genPicker { my $fh = shift; my( @vals, @odds ); ( $vals[ @vals ], $odds[ @odds ] ) = split( ' +' ) for <$fh>; ## Sort if not sorted my @order = sort{ $odds[ $a ] <=> $odds[ $b ] } 0 .. $#odds; @odds = @odds[ @order ]; @vals = @vals[ @order ]; ## Calculate and accumulate break points my $t = 0; $t += $_ for @odds; $_ /= $t for @odds; $odds[ $_ + 1 ] += $odds[ $_ ] for 0 .. $#odds - 1; ## Generate a subroutine to do the picking return sub { my $r = rand(); $r < $odds[ $_ ] and return $vals[ $_ ] for 0 .. $#odds; }; } close $fh; [download] Here is my numbs.dat file (I changed the numbers to simplify things): `A 0.0001 B 0.0004 C 0.0008` [download] For what it's worth, there is a new line after each row except the last, and each letter is separated from the corresponding number by a single space. I think something is going wrong with my split, because if I pause the script and look into the variables, my @odds array has three slots, each of which is undefined, whereas my @vals array has three slots, each of which contains a letter, a space, a number and then for the first two slots a new line character. Do you see what's wrong? Best wishes, Eric	[reply] [d/l] [select]
Re^7: an algorithm to randomly pick items that are present at different frequencies by BrowserUk (Patriarch) on May 27, 2015 at 22:33 UTC
Re^8: an algorithm to randomly pick items that are present at different frequencies by efoss (Acolyte) on May 28, 2015 at 04:54 UTC
Re^8: an algorithm to randomly pick items that are present at different frequencies by efoss (Acolyte) on Jun 04, 2015 at 20:30 UTC
Some notes below your chosen depth have not been shown here