in reply to strange keys with a hash

Perl allocates a number of buckets per hash. Whenever a certain percentage of the buckets are filled with hash keys, the number of buckets gets doubled. Because of the hash algorithm a key then either stays in its bucket or has to go to the corresponding bucket of the new space.

When you print a hash in scalar context, it tells you how many buckets it has and how many are filled. Try this:

perl -e ' for $i (1..100) {$h{$e++}=1; print scalar(%h),"\n" }'

Note that not every key entry occupies a new bucket. That is the case when two different keys produce the same hash number and therefore occupy the same bucket.

Replies are listed 'Best First'.
Re^2: strange keys with a hash
by zwon (Abbot) on Jan 30, 2009 at 20:20 UTC

    Wow! I just recently couldn't figure out why one function returns me '1/8', so now I've got it. Thanks for that!

      Perl starts with 8 "buckets", very nice and efficient for small hash tables. Each time the hash table doubles all the hash keys have to recomputed (one more bit is added to the binary key). If you know that your hash table is gonna be huge, you can save some computation by setting the number of buckets to be large to begin with by assigning a scalar value to %hash=16384 or whatever, Perl will round to the next higher power of 2 (ie, 16,000 is 16384). This doesn't matter for small hashes, but can save some time for huge ones and avoids the re-computations at doubling boundaries:(8,16,32,64,128,256,512,etc..). For like 90K hash keys, I would start with say 32768 (somewhat less than expected numkeys/2). You have to test to see if this matters at all in your application. I just got a few % increase in one of my apps which didn't amount to much, but then again, if I create something with 120K hash entries, I'm gonna whomp on it for awhile! And the mips to create the table just got dwarfed by what I did with it later. Your mileage may vary!