in reply to Re^7: an algorithm to randomly pick items that are present at different frequencies
in thread an algorithm to randomly pick items that are present at different frequencies

Hi BrowserUk,

I have another question about your code, specifically about this piece:

return sub { my $r = rand(); $r < $odds[ $_ ] and return $vals[ $_ ] for 0 .. $#odds; };

I described my problem as having a list of values - A, B, C, etc. - and relative odds corresponding to choosing those values. I have tried to incorporate your code into a script in which I pass a %vals_odds hash to your genPicker subroutine. Things seem to go well until the return statement, but then nothing is returned with this statement in the main body of my script:

my $pick = genPickerConverted(\%kmer_prob);

And if I step through the code, right before I would enter the "return" block above, my @odds array has cumulative relative odds in it (so it ends with a 1, as it should), but then it never enters the "return" block.

My understanding of the return block (which looked fairly foreign to me when I saw it) is as follows:

# return something that is going to come from ... # ... an unnamed subroutine (unnamed ... # ... because there's nothing between "sub" and "{" return sub { # r is a random number >= 0 < 1 my $r = rand(); # an implicit if statement: # if, when going through every value of odds from lowest ... # ... to highest, r is less than that value of odds, this ... # ... code will go on to the "and" statement, and otherwise... # ... it will go on to the next value in @odds # if it gets to the "and" statement, it will return the ... # ... corresponding value for @vals to the subroutine ... # ... call, which will, in turn, return that to the main ... # body of the script $r < $odds[ $_ ] and return $vals[ $_ ] for 0 .. $#odds; };

Is my understanding of the "return" block correct? And why does my code not get in there when I pass my vals and odds to the subroutine as a hash rather than a file handle? (I've changed the code so that I can tell that my hash gets in there correctly and is converted to @vals and @odds as I expect.) I'd include more code except that it gets really long and, I think, just adds confusion to my question.

Thanks.

Eric

Replies are listed 'Best First'.
Re^9: an algorithm to randomly pick items that are present at different frequencies
by BrowserUk (Patriarch) on Jun 04, 2015 at 20:41 UTC
    Is my understanding of the "return" block correct?

    Yes. A quick read of your comments suggest you've got that bit perfectly.

    However, as you are passing the values/odds into the sub via hash (instead of reading them from a file), then you'll need to show me your whole definition of genPickerConverted() so I can see what's going on before the bit you've posted.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
    In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
      Here it is:
      sub genPickerConverted { my %kmer_prob = %{ $_[0] }; my @vals = keys %kmer_prob; my @odds = (); foreach my $val (@vals) { push (@odds, $kmer_prob{$val}); } my @order = sort{ $odds[ $a ] <=> $odds[ $b ] } 0 .. $#odds; @odds = @odds[ @order ]; @vals = @vals[ @order ]; my $t = 0; $t += $_ for @odds; $_ /= $t for @odds; $odds[ $_ + 1 ] += $odds[ $_ ] for 0 .. $#odds - 1; return sub { my $r = rand(); $r < $odds[ $_ ] and return $vals[ $_ ] for 0 .. $#odds; }; }

        Okay. I took your version of the sub -- which I don't see anything wrong with; though it could be simplified somewhat -- and plugged it into my testscript from above:

        #! perl -slw use strict; use Data::Dump qw[ pp ]; sub genPickerConverted { my %kmer_prob = %{ $_[0] }; my @vals = keys %kmer_prob; my @odds = (); foreach my $val (@vals) { push (@odds, $kmer_prob{$val}); } my @order = sort{ $odds[ $a ] <=> $odds[ $b ] } 0 .. $#odds; @odds = @odds[ @order ]; @vals = @vals[ @order ]; my $t = 0; $t += $_ for @odds; $_ /= $t for @odds; $odds[ $_ + 1 ] += $odds[ $_ ] for 0 .. $#odds - 1; return sub { my $r = rand(); $r < $odds[ $_ ] and return $vals[ $_ ] for 0 .. $#odds; }; } my %kmer_probe = map{ split( ' ' ) } <DATA>; pp \%kmer_probe; my $picker = genPickerConverted( \%kmer_probe ); my %tally; ++$tally{ $picker->() } for 1 .. 1e6; pp \%tally; __DATA__ A 1e-7 B 20e-7 C 10e-5

        And it produces:

        C:\test>junk997.pl { A => 1e-7, B => 20e-7, C => 10e-5 } { A => 971, B => 19509, C => 979520 }

        Which is exactly what I'd expect.

        So, as I don't really understand what you mean by:

        why does my code not get in there when I pass my vals and odds to the subroutine as a hash rather than a file handle?

        You're going to have to clarify what you mean by that.

        However, having re-read your prior post, I saw something that didn't mean anything at my first reading:" but then it never enters the "return" block."; and that maybe the clue to your confusion.

        The anonymous subroutine that is returned by the function will not be entered at that time. The return statement is return a reference to that anonymous subroutine, that gets assigned to the variable my $picker = genPickerConverted( \%kmer_probe ); in the main program.

        That subroutine doesn't get executed (entered) until you dereference the $picker variable by doing:  $picker->();. Only then does teh subroutine get run.

        Does that explain your problem?


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
        In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked