in reply to Re: Weighted random selection
in thread Weighted random selection
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: Weighted random selection
by BrowserUk (Patriarch) on Jul 29, 2008 at 00:57 UTC | |
Isn't this pretty much what I had in the first place? Having re-inspected your code, yes. Kinda. I didn't recognise as such the first time I looked. That does raise a question though. Why do you think it is inefficient? The cost of building @choices (your @rndselect) is a one-off up front cost. And hardly onorous at that unless you have thousand or millions of choices. The actual runtime cost: $choices[ rand @choices ]; is as minimal as it is possible to be, and totally fair. One random number generation and one access to the array. The only real alternative to it (for fairness) is the Knuth algorithm and from memory that requires you to: So O(N(/2)) compared to O(1). If you have sufficient choices that the memory for the array is a concern, then pack indexes into a string and cut your memory requirement to 1/8th. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |
by JavaFan (Canon) on Jul 29, 2008 at 11:31 UTC | |
And impossible to do well if your weights are sqrt(2), sqrt(5) and sqrt(11). [Knuth vs BrowserUK] So O(N(/2)) compared to O(1). Not quite a fair comparison, is it? You give yourself a preprocessing complexity of Ω(R) (in both time and memory), where R is the sum of all weights (and could hence be much larger than N), while you don't give Knuth any. | [reply] |
by BrowserUk (Patriarch) on Jul 29, 2008 at 12:31 UTC | |
And impossible to do well if your weights are sqrt(2), sqrt(5) and sqrt(11). Not at all. It just requires a little
Produces these results over 1000 and 1 million picks.
By all means do more thorough testing if you think it is warrented. Knuth vs BrowserUK Hardly my algorithm since the OP was already using it. I might have reinvented it for myself, but it's not rocket science. For Knuth's algorithm, there is no pre-processing needed (or that can be done). For the pre-processing involved in my(*) algorithm. Given that the second test above does 3 preprocesses and 3 million picks in < 4 seconds, unless you are doing millions of picks (over which the preprocessing cost will amortised), then there is no point in worrying about efficiency. And once so amortised, the (R < sum weights due to scaling) costs is fractional. Even with the OPs scenario where he is perhaps only making one pick per (cgi) run--if so, what does efficiency of one pick mean relative to the start-up and network costs anyway?--then there is no need to do the pre-processing for every run. It only need be done when the weights change. The item list & weights must be loaded in from somewhere--eg. a file or DB--and the pre-processing could be..um, well...pre-processed. And the items read in as a pre-expanded list or string. So the pre-processing again gets amortised over whatever number of picks occur between weight changes. Either way you slice it, if efficiency is really a consideration, then an efficient implementation of the my(*) algorithm will beat Knuth's (for this use). (*) That's very tongue in cheek! Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |