Perl allocates a number of buckets per hash. Whenever a certain percentage of the buckets are filled with hash keys, the number of buckets gets doubled. Because of the hash algorithm a key then either stays in its bucket or has to go to the corresponding bucket of the new space.
When you print a hash in scalar context, it tells you how many buckets it has and how many are filled. Try this:
perl -e ' for $i (1..100) {$h{$e++}=1; print scalar(%h),"\n" }'
Note that not every key entry occupies a new bucket. That is the case when two different keys produce the same hash number and therefore occupy the same bucket.
| [reply] [d/l] |
| [reply] |
Perl starts with 8 "buckets", very nice and efficient for small hash tables. Each time the hash table doubles all the hash keys have to recomputed (one more bit is added to the binary key). If you know that your hash table is gonna be huge, you can save some computation by setting the number of buckets to be large to begin with by assigning a scalar value to %hash=16384 or whatever, Perl will round to the next higher power of 2 (ie, 16,000 is 16384). This doesn't matter for small hashes, but can save some time for huge ones and avoids the re-computations at doubling boundaries:(8,16,32,64,128,256,512,etc..). For like 90K hash keys, I would start with say 32768 (somewhat less than expected numkeys/2). You have to test to see if this matters at all in your application. I just got a few % increase in one of my apps which didn't amount to much, but then again, if I create something with 120K hash entries, I'm gonna whomp on it for awhile! And the mips to create the table just got dwarfed by what I did with it later. Your mileage may vary!
| [reply] |
Obviously, if you use longer keys, perl requires more memory to store them. Here's a simple program that creates a hash with given key length and shows how much memory it uses, so you can see how it depends:
use strict;
use warnings;
use Devel::Size qw(total_size);
my $length = $ARGV[0];
my $key = "a" x $length;
my %hash;
for ( 1 .. 1_000_000 ) {
$hash{$key} = 1;
$key++;
}
print "key size: $length\n";
print `ps -p $$ v`;
print total_size( \%hash ), "\n";
__END__
key size: 8
PID TTY STAT TIME MAJFL TRS DRS RSS %MEM COMMAND
22551 pts/0 S+ 0:01 0 3 146484 128864 3.1 perl hash_k
+eys.pl 8
74388680
key size: 16
PID TTY STAT TIME MAJFL TRS DRS RSS %MEM COMMAND
22557 pts/0 R+ 0:01 0 3 146484 128860 3.1 perl hash_k
+eys.pl 16
82388680
key size: 32
PID TTY STAT TIME MAJFL TRS DRS RSS %MEM COMMAND
22563 pts/0 R+ 0:01 0 3 162116 144488 3.5 perl hash_k
+eys.pl 32
98388680
key size: 64
PID TTY STAT TIME MAJFL TRS DRS RSS %MEM COMMAND
22569 pts/0 S+ 0:01 0 3 193384 175740 4.3 perl hash_k
+eys.pl 64
130388680
key size: 128
PID TTY STAT TIME MAJFL TRS DRS RSS %MEM COMMAND
22575 pts/0 R+ 0:01 0 3 255936 238240 5.8 perl hash_k
+eys.pl 128
194388680
key size: 256
PID TTY STAT TIME MAJFL TRS DRS RSS %MEM COMMAND
22581 pts/0 R+ 0:02 0 3 380928 363244 8.9 perl hash_k
+eys.pl 256
322388680
key size: 512
PID TTY STAT TIME MAJFL TRS DRS RSS %MEM COMMAND
22587 pts/0 S+ 0:04 0 3 630924 613252 15.1 perl hash_k
+eys.pl 512
578388680
key size: 1024
PID TTY STAT TIME MAJFL TRS DRS RSS %MEM COMMAND
22597 pts/0 R+ 0:07 0 3 1130868 1113256 27.4 perl hash
+_keys.pl 1024
1090388680
| [reply] [d/l] |