in reply to Weighted random selection

A simple and efficient method is to build an array of items to pick where each item is replicated the number of times to match it's weighting. (Note: I factored the weights by 5 as that reduces the size of the array.);

Then you just pick (once!) from that array and your guaranteed to get darn close to your chosen weightings:

#! perl -slw use strict; use List::Util qw[ sum ]; our $N ||= 1e3; my %group = ( ad1 => 1, ad2 => 2, ad3 => 4 ); my $t = sum values %group; print "Expected"; for ( sort keys %group ) { printf "$_ to be chosen %f%%\n", $group{ $_ } / $t * 100; } my( $key, $value, @choices ); push @choices, ($key) x $value while ( $key, $value ) = each %group; my %picks; for ( 1.. $N ) { $picks{ @choices[ rand @choices ] } ++; } print "\nActual"; for ( sort keys %picks ) { printf "$_ chosen %f%%\n", $picks{ $_ } / $N * 100; } __END__ C:\test>700683 -N=1e6 Expected ad1 to be chosen 14.285714% ad2 to be chosen 28.571429% ad3 to be chosen 57.142857% Actual ad1 chosen 14.276200% ad2 chosen 28.601600% ad3 chosen 57.122200% C:\test>700683 -N=1e6 Expected ad1 to be chosen 14.285714% ad2 to be chosen 28.571429% ad3 to be chosen 57.142857% Actual ad1 chosen 14.309300% ad2 chosen 28.577100% ad3 chosen 57.113600%

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

Replies are listed 'Best First'.
Re^2: Weighted random selection
by cosmicperl (Chaplain) on Jul 28, 2008 at 23:58 UTC
    Isn't this pretty much what I had in the first place?
      Isn't this pretty much what I had in the first place?

      Having re-inspected your code, yes. Kinda. I didn't recognise as such the first time I looked.

      That does raise a question though. Why do you think it is inefficient?

      The cost of building @choices (your @rndselect) is a one-off up front cost. And hardly onorous at that unless you have thousand or millions of choices.

      The actual runtime cost: $choices[ rand @choices ]; is as minimal as it is possible to be, and totally fair. One random number generation and one access to the array.

      The only real alternative to it (for fairness) is the Knuth algorithm and from memory that requires you to:

      1. Pick a random number;
      2. traverse (on average) half the weights array (or hash) accumulating the weights until the sum is greater than the random number you picked.
      3. Then obtain the number (either from a parallel array or the hash).

      So O(N(/2)) compared to O(1).

      If you have sufficient choices that the memory for the array is a concern, then pack indexes into a string and cut your memory requirement to 1/8th.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        The cost of building @choices (your @rndselect) is a one-off up front cost. And hardly onorous at that unless you have thousand or millions of choices.

        And impossible to do well if your weights are sqrt(2), sqrt(5) and sqrt(11).

        [Knuth vs BrowserUK] So O(N(/2)) compared to O(1).

        Not quite a fair comparison, is it? You give yourself a preprocessing complexity of Ω(R) (in both time and memory), where R is the sum of all weights (and could hence be much larger than N), while you don't give Knuth any.