Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re: Randomly choosing from a set of alternatives with varying popularity

by LanX (Saint)
on Mar 28, 2022 at 01:23 UTC ( [id://11142437]=note: print w/replies, xml ) Need Help??


in reply to Randomly choosing from a set of alternatives with varying popularity

> To favor the popular, you choose a random number N from 0 to 549, then traverse the set of animals, subtracting each one's popularity from N until N goes negative, and that's the animal you choose.

I don't think that's a good idea. More than 80% of your random numbers will be bigger than 100, which means none will be picked.

This also explains why you have problems inverting this "solution".

Furthermore a linear search in the hash won't be truly random, because the hash order is fixed within the duration of a run. This will result in a disproportionate representation of the early entries.

Solution is easy:

sort the N entries and chose a modified random function which favors one end of the range, like $i = int $N * rand() ** $e and $j = $N - 1 - $i for index in your sorted list. The way you chose $e > 1 will decide about the degree of bias ( $e=1 means no bias)

Just test the distribution of the indices:

DB<41> %h=();$e=3; $N=10; $h{ int $N * rand() **$e }++ for 1..10000 DB<42> x \%h 0 HASH(0x2f25fe8) 0 => 4670 1 => 1215 2 => 838 3 => 686 4 => 550 5 => 484 6 => 407 7 => 388 8 => 426 9 => 336 DB<43>

In this example the first 10% are more than 10 times more likely to be picked than the last 10%.

Of course YMMV about the distribution you prefer, but the basic idea is the same.

Cheers Rolf
(addicted to the Perl Programming Language :)
Wikisyntax for the Monastery

Replies are listed 'Best First'.
Re^2: Randomly choosing from a set of alternatives with varying popularity
by hv (Prior) on Mar 28, 2022 at 03:35 UTC

    I suspect you've misunderstood the 'popular' algorithm: 550 is the sum of the votes, so if (for example) the hash is traversed in order of decreasing popularity, rolling 0..99 will give alligator, 100..179 will give bear, etc., until 540..549 gives jellyfish. As far as I know this is pretty much the standard approach (up to optimizations) for picking from a set with weights.

      > rolling 0..99 will give alligator, 100..179 will give bear, etc., until 540..549 gives jellyfish.

      Indeed, I missed the fact that N is an accumulated sum where all votes so far are subtracted. That wasn't obvious from the description and the code wasn't concise.

      > As far as I know this is pretty much the standard approach

      I'm curious to know how my other objection is handled. The order in a Perl hash is random, but fixed during a run. So this algorithm will be biased to some results.

      Do you have a link to this "standard approach" discussing the distribution of picks?

      My guess is the list must be sorted in ascending order. Or to be more precise that the choice over ascending or descending decides over popular vs unpopular.

      > 0..99 will give alligator, 100..179 will give bear, etc., until 540..549 gives jellyfish.

      FWIW, the results you give can only be reproduced with a descending sort.

      > > > { alligator => 100, bear => 90, cat => 80, ... jellyfish => 10 }

      (well almost s/179/189/ )

      NB: the OP has uncommented the sort in his code.

      > > > # sort { ($votes{$b} <=> $votes{$a}) } #unnecessary

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery

        > I'm curious to know how my other objection is handled. The order in a Perl hash is random, but fixed during a run. So this algorithm will be biased to some results.

        OK, after a second coffee I finally got my had wrapped around it.

        Definition/Interpretation matters:

        The random pick $r between from 0..N-1 represents what the voter $r has chosen.

        Hence order doesn't matter as long as it is fixed, since a 100 wide interval will always have the same probability of 100/550, no matter where it occurs.˛

        But I don't see any intuitive way to invert this interpretation, because $r has not-voted to all other choices.

        Assigning "inverted" votes like suggested in your other post might work, as long as they sum up to 550 again°. But the question of the weighting factor and distribution will come up again.

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery

        °) respectively N' is corrected.

        update

        ˛) IOW no matter if voters John and Jim swapped their place, their probability to be chosen stays the same.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11142437]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (7)
As of 2024-04-19 10:09 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found