Random hash element (low memory edition).

amarquis has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Random hash element (low memory edition). by Roy Johnson (Monsignor) on Jan 24, 2008 at 19:29 UTC
You can find the number of elements using `keys` in a scalar context, and it won't take time or space to go through the elements. But you could use the recipe for printing a random line from a file to just iterate through the hash without knowing how many elements it has and select a random one. Yes, as long as you don't change the hash, `keys` will return keys in the same order every time. `values` will return values in the corresponding order. Caution: Contents may have been coded under pressure.	[reply] [d/l] [select]
Re^2: Random hash element (low memory edition). by amarquis (Curate) on Jan 24, 2008 at 20:51 UTC
That method is awfully neat, but my perl skills are not so I'll hit you with a follow-up question. `rand($.) < 1 && ($line = $_) while <>;` So this iterates through the file and, if a random number from 0 to the current line number is less than 1, selects the current line, overwriting previous choices? So, in my case, I'd just keep a count variable for how many elements I've been through and use that in place of $.? I'm going to have to check out the proof of that, it is really interesting that everything evens out (i.e. the first elements have a high chance of being overwritten, the latter elements have a high chance of just not being picked, and it all balances out to a random selection.) Thanks for pointing me to that FAQ entry!	[reply] [d/l]
Re^3: Random hash element (low memory edition). by Roy Johnson (Monsignor) on Jan 24, 2008 at 21:06 UTC
So, in my case, I'd just keep a count variable for how many elements I've been through and use that in place of $.? Exactly. Caution: Contents may have been coded under pressure.	[reply]
Re: Random hash element (low memory edition). by BrowserUk (Patriarch) on Jan 24, 2008 at 20:06 UTC
You could use something like this: Code updated per ikegami's post below. `sub getRandomPairFromHash{ my $href = shift; my $nKeys = keys %$href; my $n = 1 + int rand $nKeys; my( $key, $val ); ( $key, $val ) = each %$href while $n--; return ( $key, $val ) }` [download] It does work and doesn't increase the memory consumption, but fast it's not. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l]
Re^2: Random hash element (low memory edition). by ikegami (Patriarch) on Jan 25, 2008 at 02:13 UTC
That fails when $n == 0 and when $n == 1. One way to fix it is to apply both of the following changes: Change `while --$n` to `while $n--` Change `my $n = int rand $nKeys;` to `my $n = 1 + int rand $nKeys;` Update: Might as well post the modified code: `sub getRandomPairFromHash{ my $href = shift; my $nKeys = keys %$href or return ( undef, undef ); my $n = 1 + int rand $nKeys; my( $key, $val ); ( $key, $val ) = each %$href while $n--; return ( $key, $val ) }` [download]	[reply] [d/l] [select]
Re: Random hash element (low memory edition). by dave_the_m (Monsignor) on Jan 24, 2008 at 22:52 UTC
The order in which keys are returned by keys/each/value are supposed to be random (not crypotgraphically random, but explicity coded as random, with the random factor being different for each run (see PERL_HASH_SEED). So why not just grab the first element of the hash? Or if you need more than one, grab the first N? Or you need a single random element from time to time, either maintain a counter and grab element $n++ using the methods discussed eaither in this thread, or if you can guarantee that the hash's iterator won't be used elsewhere, you can use 'each' to get the next element. Dave.	[reply]
Re^2: Random hash element (low memory edition). by moritz (Cardinal) on Jan 24, 2008 at 23:11 UTC
The order in which keys are returned by keys/each/value are supposed to be random (not crypotgraphically random, but explicity coded as random, with the random factor being different for each run (see PERL_HASH_SEED). If they are, then my perl is quite buggy. I ran `perl -MData::Dumper -we 'print Dumper \%ENV'` a few times, both with perl 5.8.8 and 5.10.0 (on Debian Etch i386), and I always got the same order. (And this is no Data::Dumper artifact - it works with `join ", ", keys %ENV` as well.) It is true that you shouldn't rely on a particular hash order, but it doesn't mean a randomized order is guaranteed.	[reply] [d/l] [select]
Re^3: Random hash element (low memory edition). by dave_the_m (Monsignor) on Jan 26, 2008 at 17:42 UTC
My mistake. The first cut of the hash randomisation code always randomised, but then it was changed to only randomise in pathological cases, to avoid too much code breakage. Dave.	[reply]
Re^2: Random hash element (low memory edition). by bart (Canon) on Jan 27, 2008 at 12:39 UTC
The order in which keys are returned by keys/each/value are supposed to be random (not crypotgraphically random, but explicity coded as random, with the random factor being different for each run (see PERL_HASH_SEED). Uh, it may be different each time you run the script, but if you iterate through the same hash twice in your script, you'll get the same order twice.	[reply]
Re: Random hash element (low memory edition). by bart (Canon) on Jan 27, 2008 at 12:49 UTC
You can adapt the code from perlfaq5 (and, like it says, the Camel book) How do I select a random line from a file? for when iterating through your hash, once, instead of through the lines of a file. `my $chosen; my $n = 0; while(my($k, $v) = each %hash) { rand(++$n) < 1 and $chosen = $v; } return $chosen;` [download]	[reply] [d/l]
Re^2: Random hash element (low memory edition). by BrowserUk (Patriarch) on Jan 27, 2008 at 14:30 UTC
What's the advantage of this? Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply]
Re^3: Random hash element (low memory edition). by bart (Canon) on Jan 28, 2008 at 12:03 UTC
The advantage? I'm not sure there is one. It is a different approach than the one proposed by you in Re: Random hash element (low memory edition)., and it is not worse. That is interesting by itself. It could even be beneficial in case of a tied hash, as a call to keys makes it loop through all the keys of the tied hash, just to get the count. So, in that case, your code will loop through them at least once, to get the keys, and a fraction of a cycle (up to a complete cycle) more than that. I assume that my approach could be faster, in that case. But note that the Big O is still the same for both approaches: O(n), where n is the number of hash keys. It might become more interesting if one wants a weighted probability, then in my code, you "just" have to replace the test condition by a different function of rand() and a weight that depends on the current key. But the structure of the code can remain the same. That's a huge advantage. I have derived a usable function in a journal entry on use.perl.org. With weighted probabilities (the weight is a positive number that is proportional to the desired odds that this item is picked; if it's zero, the item is skipped), the code can be: `my($threshold, $value); while(my($k, $v) = each(%hash)) { if(my $weight = weight($k)) { # function call my $rand = -log(1 - rand)/$weight; if(not defined $threshold or $rand < $threshold) { $threshold = $rand; $value = $v; } } }` [download] }	[reply] [d/l]
Re^4: Random hash element (low memory edition). by BrowserUk (Patriarch) on Jan 28, 2008 at 13:12 UTC
Re^5: Random hash element (low memory edition). by bart (Canon) on Jan 28, 2008 at 13:40 UTC
Some notes below your chosen depth have not been shown here