in reply to Re: Picking a random item through probability
in thread Picking a random item through probability

Actully a hash does make sense if it has been used to accumulate usage statistics:

++$hash{$_} for @stuff;

but, as you almost suggest, the hash should look like:

my %hash = ( foo => 1, bar => 3, baz => 4, quux => 9, );

DWIM is Perl's answer to Gödel

Replies are listed 'Best First'.
Re^3: Picking a random item through probability
by Tanktalus (Canon) on Nov 23, 2006 at 21:37 UTC

    Actually, even this is problematic. Because you need to go through the list in a defined order to ensure that each item has its respective probabilities. So you really do need the array refs ... but you need them in a list, not a hash.

    my @items = ( ["foo", 1], ["bar", 3], ["baz", 4], ["quux", 9] );
    Then, when you generate a number, you first loop through summing $_->[1]:
    use List::Util; my $total_weight = List::Util::sum(map { $_->[1] } @items);
    and then you generate a number from that, and then find the item in the total:
    my $pick = int(rand($total_weight)); my $picked_item = (List::Util::first { $pick -= $_->[1]; $pick < 0 } + @items)->[0];
    Ok, that's a bit convoluted, but I don't quite want to do someone's homework ;-) Anyway, the point is that if, when iterating over the hash, you end up with the hash being reordered from lookup to lookup, I don't think you'll get precisely the correct distribution. (Of course, I'm assuming a perfectly random rand function, but that is a separate problem.)

    Update: fix some minor formatting, and off-by-one error (was $pick <= 0)

      Woaw, impressive :) So concise, yet so clear. Not to mention that it actually works too!

      I read up about List::Util::first and I wish I knew about it earlier, that's a hell of an awesome subroutine (in fact the whole module is pretty cool, too).

      At first I was a bit puzzled about the (L::U::f)->[0] thing, but of course - all elements are in fact two elements each, and at this point of the progress we only care about the first. Nifty how you put all that in just one line without becoming too obscure.

      So much for flattering around, because there's still one part I don't quite understand. They do it in Weighted random numbers generator too. What does $pick dropping below zero have to do with anything? How does it work? I tried to figure it out by running the code mentally, but I just don't understand...

      Update: also, let it be known it's no homework :) I write Perl just as an hobby. I'm not really into computer science or whatsoever - I mean, I understand something like this can't be hard and if it would've been a homework assignment I would be ashamed I couldn't figure it out by myself :)

        List::Util - yeah, it was other monks who introduced me to this module. I've made good use of it since ;-)

        Back to your question ... Let's say that the number we get is "5". Looking at this list, that should result in picking "baz". How do we do that? We look at the numbers: the first weight is one. So that would be a pick of 0. That's not it. So then the next option, 'bar', has a weight of 3. So that will cover picks 1-3. But we have 5, so that's not in range, either. Next option has a weight of 4 - numbers 4-7. We have 5. Got it.

        That's the straight-forward approach. The convoluted approach reverses this. We start with the first option, and discover that it has a weight of 1. So, now we look at the number - if the number were 0, this would be a match. If the number were 1, it isn't. So we could simply check if $pick < $weight and return 1, otherwise we just subtract the weight from our pick, and check the next number. However, since we're in a block, not a sub, we need to be a bit more tricky. So, we do a bit of algebraic magic, and we come up with $pick - $weight < 0, and we still want to subtract the weight before going on to the next one. Finally, we just do the subtraction first, and let $pick < 0 be our boolean return from the block.