#! perl -slw
use strict;
use Data::Dump qw[ pp ];
our $N ||= 1e4;
sub pickGen {
my $scale = shift;
my $ref = @_ > 1 ? \@_ : $_[ 0 ];
my $total = 0;
$total += $_ for @$ref;
my $pickStick = pack 'C*', map{
($_) x ( $scale * $ref->[ $_ ] * 10 / $total )
} 0 .. $#$ref;
return sub {
return ord substr $pickStick, rand length $pickStick, 1;
};
}
my @items = 'A' ..'C';
## Your "difficult" weights:
my @weights = map sqrt $_, 2, 5, 11;
## Only required for testing
my $total = 0;
$total += $_ for @weights;
print "Required weights(%):\n\t", join"\t", map $_ *100 / $total, @w
+eights;
for my $scale ( 1, 10, 100 ) {
## Generate a picker (cost amortized over $N uses)
my $picker = pickGen $scale, @weights;
## use it $N times
my %stats;
$stats{ $items[ $picker->() ] }++ for 1 .. $N;
## Check the results
print "\nResults for $N picks (scale:$scale)\n\t",
join "\t\t\t", map $stats{ $_ } *100 / $N, @items;
}
Produces these results over 1000 and 1 million picks.
c:\test>weightedPick.pl -N=1e3
Required weights(%):
20.2990178902944 32.0955653989182 47.60541671078
+74
Results for 1e3 picks (scale:1)
21.5 34 44.5
Results for 1e3 picks (scale:10)
19.4 32.5 48.1
Results for 1e3 picks (scale:100)
19.5 29.5 51
c:\test>weightedPick.pl -N=1e6
Required weights(%):
20.2990178902944 32.0955653989182 47.60541671078
+74
Results for 1e6 picks (scale:1)
22.2248 33.3686 44.4066
Results for 1e6 picks (scale:10)
20.2238 32.3209 47.4553
Results for 1e6 picks (scale:100)
20.1913 32.0977 47.711
By all means do more thorough testing if you think it is warrented.
Knuth vs BrowserUK
Hardly my algorithm since the OP was already using it. I might have reinvented it for myself, but it's not rocket science.
For Knuth's algorithm, there is no pre-processing needed (or that can be done).
For the pre-processing involved in my(*) algorithm. Given that the second test above does 3 preprocesses and 3 million picks in < 4 seconds, unless you are doing millions of picks (over which the preprocessing cost will amortised), then there is no point in worrying about efficiency. And once so amortised, the (R < sum weights due to scaling) costs is fractional.
Even with the OPs scenario where he is perhaps only making one pick per (cgi) run--if so, what does efficiency of one pick mean relative to the start-up and network costs anyway?--then there is no need to do the pre-processing for every run. It only need be done when the weights change.
The item list & weights must be loaded in from somewhere--eg. a file or DB--and the pre-processing could be..um, well...pre-processed. And the items read in as a pre-expanded list or string. So the pre-processing again gets amortised over whatever number of picks occur between weight changes.
Either way you slice it, if efficiency is really a consideration, then an efficient implementation of the my(*) algorithm will beat Knuth's (for this use).
(*) That's very tongue in cheek!
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
|