in reply to how are the number of buckets in a perl hash chosen

Perl hashing algorithm has changed once or twice ... http://perl5.git.perl.org/perl.git/blob?f=hv.c, Re^2: Hash order randomization is coming, are you ready?

$ perl -e " keys(%f)=7931; $f{1}=1; die scalar %f" 1/8192 at -e line 1.
7931 is not a prime 8192 is not a prime but the Prime factorization is ... ??? ... ;D ... 2**13

So there you have it, I'm sure perldata or keys talks about this, presizing hash, the bucketizing to the power of two ...

Replies are listed 'Best First'.
Re^2: how are the number of buckets in a perl hash chosen
by david2008 (Scribe) on Jun 09, 2014 at 08:31 UTC
    If i did understand exactly, from your example, that the power of 2 should be a prime.
    I tried it on different examples and saw that this is not correct every time.
    perl -E "%c = (3=>5);keys(%c) = 33;say scalar(%c)" 1/64
    Maybe i did not understand it 100%.
    I looked at perldata and keys and saw indeed that the number of buckets is a power of 2 but it was not explained why.

      If i did understand exactly, from your example, that the power of 2 should be a prime.

      I don't know, wolfram says "prime factorization"...

      I looked at perldata and keys and saw indeed that the number of buckets is a power of 2 but it was not explained why.

      Oh, why? because :) its coded that way :D

      newsize &= ~(newsize & (1 + ~newsize)); /* get proper power of 2 */ .. Newxz(a, PERL_HV_ARRAY_ALLOC_BYTES(newsize), char);

      Programmers just love to do things in powers of two

      perl hash presize why power of 2 How Hashes Really Work - Perl.com

      In principle, a hashing function returns an array index directly; in practice, it is common to use its (arbitrary) return value modulo the number of buckets as the actual index. (Using a prime number of buckets that is not too close to a power of two tends to produce a sufficiently uniform key distribution.)
      A Hash Function for Hash Table Lookup
      The best hash table sizes are powers of 2. There is no need to do mod a prime (mod is sooo slow!).
      Hash table
      In the case that the array size is a power of two, the remainder operation is reduced to masking, which improves speed, but can increase problems with a poor hash function.

      So there you have it, power of two, its for the speed