in reply to Re^9: an algorithm to randomly pick items that are present at different frequencies
in thread an algorithm to randomly pick items that are present at different frequencies

Here it is:
sub genPickerConverted { my %kmer_prob = %{ $_[0] }; my @vals = keys %kmer_prob; my @odds = (); foreach my $val (@vals) { push (@odds, $kmer_prob{$val}); } my @order = sort{ $odds[ $a ] <=> $odds[ $b ] } 0 .. $#odds; @odds = @odds[ @order ]; @vals = @vals[ @order ]; my $t = 0; $t += $_ for @odds; $_ /= $t for @odds; $odds[ $_ + 1 ] += $odds[ $_ ] for 0 .. $#odds - 1; return sub { my $r = rand(); $r < $odds[ $_ ] and return $vals[ $_ ] for 0 .. $#odds; }; }
  • Comment on Re^10: an algorithm to randomly pick items that are present at different frequencies
  • Download Code

Replies are listed 'Best First'.
Re^11: an algorithm to randomly pick items that are present at different frequencies
by BrowserUk (Patriarch) on Jun 04, 2015 at 22:35 UTC

    Okay. I took your version of the sub -- which I don't see anything wrong with; though it could be simplified somewhat -- and plugged it into my testscript from above:

    #! perl -slw use strict; use Data::Dump qw[ pp ]; sub genPickerConverted { my %kmer_prob = %{ $_[0] }; my @vals = keys %kmer_prob; my @odds = (); foreach my $val (@vals) { push (@odds, $kmer_prob{$val}); } my @order = sort{ $odds[ $a ] <=> $odds[ $b ] } 0 .. $#odds; @odds = @odds[ @order ]; @vals = @vals[ @order ]; my $t = 0; $t += $_ for @odds; $_ /= $t for @odds; $odds[ $_ + 1 ] += $odds[ $_ ] for 0 .. $#odds - 1; return sub { my $r = rand(); $r < $odds[ $_ ] and return $vals[ $_ ] for 0 .. $#odds; }; } my %kmer_probe = map{ split( ' ' ) } <DATA>; pp \%kmer_probe; my $picker = genPickerConverted( \%kmer_probe ); my %tally; ++$tally{ $picker->() } for 1 .. 1e6; pp \%tally; __DATA__ A 1e-7 B 20e-7 C 10e-5

    And it produces:

    C:\test>junk997.pl { A => 1e-7, B => 20e-7, C => 10e-5 } { A => 971, B => 19509, C => 979520 }

    Which is exactly what I'd expect.

    So, as I don't really understand what you mean by:

    why does my code not get in there when I pass my vals and odds to the subroutine as a hash rather than a file handle?

    You're going to have to clarify what you mean by that.

    However, having re-read your prior post, I saw something that didn't mean anything at my first reading:" but then it never enters the "return" block."; and that maybe the clue to your confusion.

    The anonymous subroutine that is returned by the function will not be entered at that time. The return statement is return a reference to that anonymous subroutine, that gets assigned to the variable my $picker = genPickerConverted( \%kmer_probe ); in the main program.

    That subroutine doesn't get executed (entered) until you dereference the $picker variable by doing:  $picker->();. Only then does teh subroutine get run.

    Does that explain your problem?


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
    In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked

      Yes, that was it. I'm new to the arrow notation, anonymous subroutines, etc., and I'm afraid I'm still a bit puzzled by this, but adding the second line here in the main body of the script makes things work:

      my $pick = genPickerConverted(\%kmer_prob); $pick = $pick -> ();

      (Before I had only the first line.) When I talk about my code never entering the return block in the anonymous subroutine, I just mean that I add a pause/block in the debugger I'm using (Komodo) and then step through the code, looking at the contents of the variables as I go. When I'm stepping through within the "genPickerConverted" subroutine, I see all the variables being created as expected, but then when it gets to the anonymous subroutine, it just jumps back to the main body of the script. Then looking in "$pick", it looks empty. However, I guess I should have gotten a clue because in the "type" column, all of my hashes are listed as "HASH", all of my arrays are listed as "ARRAY", and all of my scalars are listed as not having a "type", except for $pick, which has type "CODE", and if I print out pick before the dereference statement with the arrow, I get a memory location, like "CODE(0x7f9af358dcb8)".

      So what's going on? Does $pick contain a reference to the anonymous subroutine, and it also remembers what was in my %kmer_prob hash?

      Thanks so very much for the help. I really appreciate it, I've learned a lot, and I was truly stuck.

      Eric

        Does $pick contain a reference to the anonymous subroutine, and it also remembers what was in my %kmer_prob hash?

        Yes, & yes.

        I suggest that you read section 4 of Perlref - Making references, carefully.

        But, in a nutshell, when you create a subroutine (named or anonymous) it 'remembers' any variables external to that subroutine that it references.

        So, this subroutine:

        my $var = 2; sub x{ print $var; } x(); # prints 2 $var = 3; x(); # prints 3

        For safety, you should isolate closed-over variables so they cannot be changed:

        { my $var = 2; sub x{ print $var }; } ## $var goes out of scope; but x() remembers it. x(); # prints 2; # $var = 3; ## Would be an error my $var = 3; x(); ## Still prints 2; the $var above is a different $var.

        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
        In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked

      Why linear search? hehehe, snicker :)

      #!/usr/bin/perl # http://perlmonks.org/?node_id=1127420 use Search::Dict; use warnings; use strict; my $max = 0; my $file = join '', map s/(\S*)$/ $max += $1 /er, <DATA>; open my $fh, '<', \$file or die "open failure $!"; my $picker = sub { look $fh, rand $max, {comp => sub {shift =~ s/.* //r <=> shift}}; (split ' ', <$fh>)[0]; }; my %tally; # now test it $tally{$picker->()}++ for 1 .. 1e4; use YAML; print Dump \%tally; __DATA__ A 1e-7 B 20e-7 C 10e-5