Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Pick any key from a very large hash

by FloydATC (Deacon)
on Jul 11, 2009 at 21:46 UTC ( [id://779258]=perlquestion: print w/replies, xml ) Need Help??

FloydATC has asked for the wisdom of the Perl Monks concerning the following question:

I have a very large (DB_File) hash which may have millions of keys in it, and I want to pick just one of them. Note that I do NOT want a truly random key, this is not easy way to get random key from hash?. I just want a key, exactly which one is not important. (The hash contains a pool of resources which may be added and removed arbitrarily)

I tried experimenting with my ($key,$value) = each %hash but I'm getting... weird results. I'm thinking this is probably because each is meant for looping through the entire hash while I'm only calling it now and then, in between adding and deleting key/value pairs.

Using keys or splice()is out of the question, the temporary array would just kill me. Any bright ideas? Is there perhaps a way to "reset" each after an incomplete scan?

-- Time flies when you don't know what you're doing

Replies are listed 'Best First'.
Re: Pick any key from a very large hash
by mzedeler (Pilgrim) on Jul 11, 2009 at 21:55 UTC

    Did you try the seq method in DB_File?

      No, to be honest I did't understand what it was for until you suggested it. The syntax had me assuming I had to pass a valid key as a starting point, but examining the example code closer revealed the truth. I gave it a whack and it works perfectly. Thank you :-)

      -- Time flies when you don't know what you're doing
Re: Pick any key from a very large hash
by jwkrahn (Abbot) on Jul 11, 2009 at 22:23 UTC

    The documentation for each explains how to reset it (hint: third paragraph).

      Yes, by using keys or values... exactly what I don't want :-)

      -- Time flies when you don't know what you're doing
        "In particular, calling keys() in void context resets the iterator with no other overhead."

        Or: "it can be reset by reading all the elements from the hash"

      I don't think your suggestion is a viable option when the hash contains millions of keys...

      Rule One: "Do not act incautiously when confronting a little bald wrinkly smiling man."

        Calling keys in void context resets the iterator and does nothing else. The number of keys in the hash is of no import.
Re: Pick any key from a very large hash
by Bloodnok (Vicar) on Jul 11, 2009 at 22:26 UTC
    ...each is meant for looping through the entire hash... - you're dead right - the hash is meant to be invariant over each - from each '...If you add or delete elements of a hash while you're iterating over it, you may get entries skipped or duplicated, so don't...'.

    In answer to your question, again from each - '..it can be reset by reading all the elements from the hash, or by evaluating keys HASH or values HASH...'.

    The moral of the story/posting is ... please RTFM before you post.

    A user level that continues to overstate my experience :-))
      Hehe I did RTFM but you know what happens when you get stuck... you start trying creative stuff even if you know it might break things in funny ways :-)

      My theory was that maybe there was some obscure way to manually whack it back to zero. Guess not. Anyway, the problem is solved :-)

      -- Time flies when you don't know what you're doing

        I realise it's not an issue now, but it seems likely that calling keys in scalar context will reset the each iterator in (very small) constant time. Indeed a small test confirms this:

        use strict; use warnings; use Benchmark qw(cmpthese); my %bigHash = map {$_ => undef} 1 .. 1e3; my @resetEach = keys %bigHash; my ($key) = each %bigHash; my $count = keys %bigHash; my ($key2) = each %bigHash; print "First key '$key', second '$key2'\n"; cmpthese ( -1, { scalarKeys => sub {scalar keys %bigHash}, listKeys => sub {() = keys %bigHash}, } );

        Prints:

        First key '559', second '559' Rate listKeys scalarKeys listKeys 5184/s -- -100% scalarKeys 15969899/s 307964% --

        True laziness is hard work
Re: Pick any key from a very large hash
by ashish.kvarma (Monk) on Jul 12, 2009 at 04:28 UTC
    If you don’t want to use "keys", here is a fancy way to do it.
    my %hash = ( a => 1, b => 2, c => 3, d => 4, e => 5, f => 6, g => 7, h => 8, i => 9, j => 10, ); my $hash_length = %hash; # To get actual length you would need 'scalar keys %hash' # But we are not using 'keys' here so lets get the length other way $hash_length =~ s{/.*}{}; foreach (1...10) { my $rand_num = rand $hash_length; # Make sure number is even. $rand_num = ($rand_num %2) ? ($rand_num - 1) : ($rand_num); my $key = (%hash)[$rand_num]; print "$_\t: $key\n"; } # output # 1 : e # 2 : d # 3 : c # 4 : j # 5 : a # 6 : e # 7 : e # 8 : c # 9 : j # 10 : e
    Regards,
    Ashish

      OP's concern was not the use of keys, but the potential cost of using it due to a perceived 'temporary array'. However various solutions to that problem have been suggested. Let's see how yours stacks up:

      use strict; use warnings; use Benchmark qw(cmpthese); my %bigHash = map {$_ => undef} 1 .. 1e3; my ($key) = each %bigHash; my $count = keys %bigHash; my ($key2) = (%bigHash)[0]; print "First key '$key', second '$key2'\n"; cmpthese ( -1, { byEach => sub {scalar keys %bigHash; my ($key) = each %bigHas +h;}, byArray => sub {my $key = (%bigHash)[0]}, } );

      Prints:

      First key '559', second '559' Rate byArray byEach byArray 5228/s -- -100% byEach 2414187/s 46082% --

      It's that 'temporary array' OP was worried about. A fancy solution no doubt but, for saving time, not a good one.


      True laziness is hard work
        I jumped in without giving it a thought and missed the real problem. I do agree it wouldn’t be efficient therefore won’t solve the purpose.
        I guess I gotta give some time before jumping in.
        Regards,
        Ashish
      Still, it's a creative solution to the problem, and within the original domain. I'm sure this technique can be helpful when dealing with a big native hash and not a DB_File.

      -- Time flies when you don't know what you're doing
Re: Pick any key from a very large hash
by psini (Deacon) on Jul 11, 2009 at 22:01 UTC

    Can't you simply exit from the each when you are done, and maybe include it in an external loop of some kind?

    Or have I completely missed the point?

    Rule One: "Do not act incautiously when confronting a little bald wrinkly smiling man."

      I tried using each without looping at all, which caused weird results when arbitrary pairs were added or deleted between each each.

      exit would be taking things a bit too far, perhaps you meant last? :-) Alternatively, if you mean use a separate process, this would defeat the DB_File built-in cache and generally kill performance.

      OR maybe I'm the one missing the point...

      -- Time flies when you don't know what you're doing

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://779258]
Approved by lidden
Front-paged by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (3)
As of 2024-04-23 06:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found