Boogman has asked for the wisdom of the Perl Monks concerning the following question:

The perlfunc manpage says about the keys function:
As an lvalue keys allows you to increase the number of hash buckets allocated for the given hash. This can gain you a measure of efficiency if you know the hash is going to get big.

I decided to try benchmarking it to see how much it did increase efficiency and ran this code:
use Benchmark; timethese( 100, {with => q{ my %hash; keys( %hash ) = keys( %hash ) = 10000; foreach $i (1 .. 10000) { $hash{"key$i"} = $i; } }, without => q{ my %hash; foreach $i (1 .. 10000) { $hash{"key$i"} = $i; } }, });

My results from this show that using keys didn't acually increase efficiency at all...
Benchmark: timing 100 iterations of with, without... with: 14 wallclock secs (13.73 usr + 0.04 sys = 13.77 CPU) @ 7 +.26/s (n=100) without: 14 wallclock secs (13.86 usr + 0.00 sys = 13.86 CPU) @ 7 +.22/s (n=100)

I would think that since this would make it so the hash wouldn't constantly have to be expanded this would greatly increase efficiency. Printing out the number of buckets filled and total number of buckets shows that without allocating the keys beforehand, the hash is reallocated somehow when expanded whereas it isn't when you do allocate:
1/16384 2/16384 3/16384 4/16384 5/16384 6/16384 7/16384 8/16384 9/16384 10/16384
vs.
4/8 5/8 6/8 7/8 7/8 7/8 7/8 7/8 12/16 13/16 13/16 13/16 etc...
Am I doing something wrong in my benchmark? Or is there some magic going on behind the scenes of hashes that make expanding the hash not an issue? Thanks

Replies are listed 'Best First'.
Re: Using keys to increase hash efficiency
by lhoward (Vicar) on Jul 28, 2000 at 18:47 UTC
    Since the size of the non-pre-allocated hash space doubles (aproximately) every time it has to expand, the number of reallocations is relatively small compared to the number of hash inserts you are doing and has a negligable effect on your hash performance. The aproximately 10 to 12 reallocations that the hash had to go through to accomodate 10,000 elements is a small chunk of time compared with the 10,000 hash inserts. As shown in your benchmarks, the pre-allocated hash is faster over the data set, but only slightly so.
RE: Using keys to increase hash efficiency
by mikfire (Deacon) on Jul 28, 2000 at 20:09 UTC
    I would also claim that your hash isn't big yet. Increase the number of keys by an order of magnitude or so. Then try it.

    Mik mikfire

Re: Using keys to increase hash efficiency
by Boogman (Scribe) on Jul 28, 2000 at 21:24 UTC
    Did the benchmarks again using 100,000 keys while allocating 200,000 keys and the difference did start to show a bit more.
    Benchmark: timing 10 iterations of with, without... with: 17 wallclock secs (13.56 usr + 0.24 sys = 13.80 CPU) @ 0 +.72/s (n=10) without: 17 wallclock secs (15.34 usr + 0.02 sys = 15.36 CPU) @ 0 +.65/s (n=10)
    Tried it using the rand idea, but after 100000 repetitions it had only alocated 20469/32768 buckets and the benchmarks weren't different at all... Also tried even more keys, but started getting (Out of memory) errors. :-) Thanks for the responses
Re: Using keys to increase hash efficiency
by ferrency (Deacon) on Jul 28, 2000 at 20:50 UTC
    You might try using a random (or at least less predictable) set of keys, instead of just incrementing by one each time. I don't have any evidence to back this up, but my intuition tells me using sequential keynames isn't really a very good test here. Seed the random number generator with the same seed inside each code block seed to ensure you're getting the same sequence of keys in each.

    Also, if you only preallocate to 10000 buckets and you're inserting 10000 keys, it's going to need to reallocate at least once anyway (the largest reallocation, therefore the slowest?). A better test might be to preallocate 20000 buckets, and then do 10000 insertions.

    Does anyone know how clever perl's hashing function is? And how does Perl deal with key hash collisions? How feasible is it that Perl might be getting a significant enough number of hash collisions that the number of buckets in the hash is insignificant?

    Alan

    Update: Thanks for the pointer to the information about the Perl hashing fucntion. After reading that, I can confidently say that in Older versions of Perl, inserting sequential keys would cause a lot of hash collisions, but with the new hashing function, that shouldn't be a problem.

      I played around with this thread a bit, but I haven't come to any earth-shattering conclusions yet, so in the meantime: http://www.deja.com/getdoc.xp?AN=650577132&fmt=text This is the best explanation of Perl's hashing I've ever seen. -dlc