Re: Weighted random selection
by almut (Canon) on Jul 28, 2008 at 22:57 UTC
|
| [reply] |
|
|
| [reply] |
Re: Weighted random selection
by moritz (Cardinal) on Jul 28, 2008 at 22:26 UTC
|
There's a much more efficient way if you know the total weight in advance (or calculate it):
use List::Util qw(sum);
my $total = sum values %$group;
my $picked;
for (keys %$group){
if (rand() < $group->{$_} / $total){
$picked = $_;
last;
}
# update: I forgot this line:
$total -= $group->{$_};
# without this line items later in the hash are chosen
# too seldom, and you might chose none at all. Very bad.
}
I somehow think it's not entirely fair, but I can't get a straight thought on stochastic at this time (0:30 AM here), maybe I can think of a prove (or disprove) tomorrow. | [reply] [d/l] |
|
|
Hmmm... Not totally sure this'll give me the right weighting. But it has inspired me to write:-
use List::Util qw(sum);
my $total = sum values %$group;
my $rnd = rand( $total );
my $runningtot = 0;
foreach ( keys %$group ) {
if ( $rnd > $runningtot && $rnd <= ( $runningtot + $group->{$_} )
+) {
$picked = $_;
last;
}#if
$runningtot += $group->{$_};
}#foreach
rand() is only called once. No array is made. Straight forward if and a last to drop out once found. Haven't tested for efficiency but I'm guessing it's pretty good.
Lyle | [reply] [d/l] |
|
|
| [reply] |
Re: Weighted random selection
by ysth (Canon) on Jul 29, 2008 at 01:07 UTC
|
Doesn't really address your overall efficiency question, but I see over and over people forgetting (or never having known) that push takes a LIST, not just a single scalar:
while ( $group->{$advert} > 0 ) {
push( @rndselect, $advert );
$group->{$advert}--;
}#while
becomes
push( @rndselect, ($advert) x $group->{$advert} );
| [reply] [d/l] [select] |
Re: Weighted random selection
by FunkyMonk (Bishop) on Jul 28, 2008 at 23:00 UTC
|
I'd just use what moritz suggested, but the code you posted can be written much more succinctly:
my %group = ( ad1 => 1, ad2 => 2, ad3 => 3 );
my @rndselect = map { ($_) x $group{$_} } keys %group;
$advertinfo->{advert} = $rndselect[rand @rndselect];
Update: I seem to remember that Knuth had something to say about this in The Art of Computer Programming (probably volume 1). I'll have a look for it tomorrow. | [reply] [d/l] |
Re: Weighted random selection
by BrowserUk (Patriarch) on Jul 28, 2008 at 23:10 UTC
|
A simple and efficient method is to build an array of items to pick where each item is replicated the number of times to match it's weighting. (Note: I factored the weights by 5 as that reduces the size of the array.);
Then you just pick (once!) from that array and your guaranteed to get darn close to your chosen weightings:
#! perl -slw
use strict;
use List::Util qw[ sum ];
our $N ||= 1e3;
my %group = ( ad1 => 1, ad2 => 2, ad3 => 4 );
my $t = sum values %group;
print "Expected";
for ( sort keys %group ) {
printf "$_ to be chosen %f%%\n", $group{ $_ } / $t * 100;
}
my( $key, $value, @choices );
push @choices, ($key) x $value while ( $key, $value ) = each %group;
my %picks;
for ( 1.. $N ) {
$picks{ @choices[ rand @choices ] } ++;
}
print "\nActual";
for ( sort keys %picks ) {
printf "$_ chosen %f%%\n", $picks{ $_ } / $N * 100;
}
__END__
C:\test>700683 -N=1e6
Expected
ad1 to be chosen 14.285714%
ad2 to be chosen 28.571429%
ad3 to be chosen 57.142857%
Actual
ad1 chosen 14.276200%
ad2 chosen 28.601600%
ad3 chosen 57.122200%
C:\test>700683 -N=1e6
Expected
ad1 to be chosen 14.285714%
ad2 to be chosen 28.571429%
ad3 to be chosen 57.142857%
Actual
ad1 chosen 14.309300%
ad2 chosen 28.577100%
ad3 chosen 57.113600%
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] |
|
|
Isn't this pretty much what I had in the first place?
| [reply] |
|
|
Isn't this pretty much what I had in the first place?
Having re-inspected your code, yes. Kinda. I didn't recognise as such the first time I looked.
That does raise a question though. Why do you think it is inefficient?
The cost of building @choices (your @rndselect) is a one-off up front cost. And hardly onorous at that unless you have thousand or millions of choices.
The actual runtime cost: $choices[ rand @choices ]; is as minimal as it is possible to be, and totally fair. One random number generation and one access to the array.
The only real alternative to it (for fairness) is the Knuth algorithm and from memory that requires you to:
- Pick a random number;
- traverse (on average) half the weights array (or hash) accumulating the weights until the sum is greater than the random number you picked.
- Then obtain the number (either from a parallel array or the hash).
So O(N(/2)) compared to O(1).
If you have sufficient choices that the memory for the array is a concern, then pack indexes into a string and cut your memory requirement to 1/8th.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |
|
|
|
|
Re: Weighted random selection
by jethro (Monsignor) on Jul 28, 2008 at 22:23 UTC
|
To my shame I must confess I used the same algorithm with my mp3 player. And I didn't give a thought about efficiency.
If you do mind, you could just sum up the all the weights and do the looking up on a second walkthrough. Definitely more memory efficient and if your weights are above 2 on average it would be faster too.
You could also keep a running weight total that would be updated on every addition or deletion in the group.
| [reply] |
Re: Weighted random selection
by JavaFan (Canon) on Jul 29, 2008 at 11:24 UTC
|
Here's what I would do. It doesn't require constructing a (potentially) large array (which would only work if the weights are integers anyway), nor does it require two passes over the data (so the technique would also work when reading input from a stream).
The technique is discussed by Knuth.
my $pick;
my $weight;
while (my ($p, $w) = each %$group)
{
$weight += $w;
$pick = $p if rand ($weight) < $w;
}
#
# $pick now contains a weighted choice.
#
| [reply] [d/l] |