If you're choosing between n different values, it's very easy to make an order(n) routine to do this, as someone else has already shown.

But, you can do better. If you pre-build an array of partial frequency sums, then you have a sorted array, and you can do a binary search on it for the correct number. This would be faster, If n is relatively large, And if the frequency table doesn't change very much.

# Warning: Untested code, almost definitely has off by one errors! # Given a frequency table in @freq, build a summed frequency table. my $sum; @sf = map {$sum += $_} @freq; # Now we choose a random number in the range of total frequency sum my $target = rand($sum); # To find out what the Answer is (weighted freq from 0 to n), # we now find n such that $sf[n-1]< $target < $sf[n] # We keep track of the numbers we know $n is between my $a = 0; my $b = @sf; my $n; # repeat until we narrow down to one answer while ($a < $b) { $n = int(($a + $b) / 2); if ($sf[$n] < $target) { $a = $n; } elsif ($sf[$n-1] > $target) { $b = $n; } else { # $target < $sf[n] and $target > $sf[n-1], # so we have an answer. last; } } print "I found $n\n";
Like I said, this is not going to work as-is, but hopefully you get the idea. There is a greater overhead in the loop than doing a linear search, and a greater setup cost to build the array of summed frequencies. But the loop runs in order (log n) time instead of order (n) time, so for sufficiently large n, this can be a great improvement over a straight linear search (use benchmarking to make sure it's a Win).

If you have a known frequency distribution (ie the frequencies are all almost equal, or there are a lot of low frequencies and a few high ones), you may be able to get Even Faster results if you use some heuristics to choose a better method than "look halfway between my known upper and lower bounds for n." For example, if your frequencies are all pretty equal, you can get a better idea of a guess for $n by seeing how far away $sf[$n] is from $target, and moving $n an appropriate distance, instead of just going halfway.

Hope this helps. Does anyone know of a good binary search module?

Alan


In reply to Re: Biased random number selection by ferrency
in thread Biased random number selection by athomason

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.