in reply to Re: Re: print scalar %hash... erm
in thread print scalar %hash... erm

Here's how a hash works, in a nutshell... You've got a list of buckets, each of which is a list of zero or more items. You also have a hash function that converts a key into a "hash" value used to look up a bucket. (For any given key, the hash function must always return the same result; otherwise you wouldn't be able to find your items again!)

When you want to do something with a specific key, you call the hash function for that key, use the result to find the appropriate bucket, and then search all the items in that bucket for the one you want. So, instead of having to check against every item in the hash, you only have to check against a few items that happen to be in the same bucket.

As you keep adding to the hash, each bucket has to hold more items. As the buckets get bigger, the lookups get slower, because you have to search through more items. That problem is solved by dynamically increasing the number of buckets and redistributing the items as the hash grows in size.

Your hash has 64 buckets (Perl always uses a power of 2), although only 42 are being used to store the 53 items. That just means that, with that combination of hash function, number of buckets, and keys, a few of the keys are sharing buckets.

Replies are listed 'Best First'.
Re: Re: Re: Re: print scalar %hash... erm
by larryk (Friar) on May 24, 2001 at 23:18 UTC
    > That problem is solved by dynamically increasing the number of buckets

    How does one go about this "easy-to-say" task?

      I can do this one :o)
      From perldoc perlfunc:

      As an lvalue `keys' allows you to increase the
      number of hash buckets allocated for the given
      hash. This can gain you a measure of efficiency
      if you know the hash is going to get big. (This
      is similar to pre-extending an array by assigning
      a larger number to $#array.) If you say

      keys %hash = 200;

      then `%hash' will have at least 200 buckets
      allocated for it--256 of them, in fact, since it
      rounds up to the next power of two. [...]

      As pileswasp pointed out, you can preallocate your hash by assigning to keys, similar to preallocating an array by assigning to $#array.

      Most of the time, of course, you'll just let Perl worry about the number of buckets. Here's how that works...

      When you add a new key to the hash, perl checks whether there are too many keys for the number of buckets. If that's the case, perl allocates a bunch more buckets. Then, perl goes through the whole hash, calls the hash function for each key, and figures out where it belongs in the newly-resized hash.

      You're probably thinking that resizing the hash and reinserting all the elements is slow; you're absolutely right. When perl has to resize a hash, it doubles the number of buckets, which keeps it from having to resize your hashes too often.

        Actually, Perl doesn't have to call the hash function for each key as the full hash value is stored along with the key so Perl just needs to compute the remainder between that value and the new number of buckets.

        Another question is whether Perl decides to allocate more buckets based on the number of buckets in use or on the number of keys. I remember being surprised by the answer to this and vaguely remember that the answer has changed. Unfortunately I don't have the time right now to dig this information up.

                - tye (but my friends call me "Tye")