Hash tables, are they really what we see?

heatblazer has asked for the wisdom of the Perl Monks concerning the following question:

Hello again.

Recently I read a post about hash tables. Not Perl`s ones but the hashes as ADT ( mainly C ones ). The thread was that hashes are not exactly the better idea and are overused by many. The better solution there were trees ( red and black ) or just lists. So, what I wonder is: How Perl know your hashes and how it finds everything so fast. I`ve read few Perl books which worship hashes like a magic - you just refer to a key and here is your value. But let`s assume we have a hash with, for example over 1 000 000 entries as a word count and we then search for a word that just does not appear there? How in blazes Perl will know that there is not that word from a million of words!? I just picture a barrel filled with red and blue balls over a million and you have to tell that you are 100% there is no other colors by just looking at that barrel.

It`s no question of Perl`s powers, it`s a question of knowledge and something that is just interesting to me. Also are linked lists or trees implemented in Perl a good idea, since they are not generics? And how you can handle the memory globing if you make a list or a tree that can grow big with many data in no time. Can Perl operate memory chunks?

Edited

Here is the link: about the hash tables

Edited

Thank you everybody for the explanation and useful information you`ve proviede me. I was able to understand some of the magic we have here with hashes. Now I see how big difference is between C hash and Perl hash.

Comment on Hash tables, are they really what we see?

Replies are listed 'Best First'.
Re: Hash tables, are they really what we see? by GrandFather (Saint) on Oct 03, 2012 at 04:46 UTC
To a very large extent just let Perl do its thing. Stuff that programmers do often Perl has been optimised to do fast. Almost always a list or tree in other languages is a solution to an intermediate problem which you can solve directly using Perl's hash or dynamic array structures. A Perl hash is an associative array which stores values that are accessed using keys. Under the hood the key turns into a "hash" (hence the name of the structure) which Perl is very quick at looking up. The whole point of a hash type data structures is that the lookup is fast (due to the hash magic) and finding something isn't there is just as fast as finding it is there. The time in both cases is essentially a small constant time. The trick is that the key gets turned by a hashing function into an index into a hash table so (simplifying greatly) the time to find (or not find) an entry is the time the hash function takes plus the time to index into the table - a short and almost constant time. Hashes are generally the answer to problems you might solve using trees in other languages. Perl is very time efficient at managing dynamic arrays that can be easily used in the ways you might want to use linked lists. In particular Perl's arrays are fast for adding and removing blocks of elements at each end and are pretty fast for adding and removing blocks of elements elsewhere in arrays. Under the hood Perl does clever stuff with linked lists, but you don't need to know that. True laziness is hard work	[reply]
Re^2: Hash tables, are they really what we see? by heatblazer (Scribe) on Oct 03, 2012 at 06:29 UTC
Thank you for the reply. I`ll let it as it is.	[reply]
Re: Hash tables, are they really what we see? by dsheroh (Monsignor) on Oct 03, 2012 at 08:26 UTC
But let`s assume we have a hash with, for example over 1 000 000 entries as a word count and we then search for a word that just does not appear there? How in blazes Perl will know that there is not that word from a million of words!? I just picture a barrel filled with red and blue balls over a million and you have to tell that you are 100% there is no other colors by just looking at that barrel. Your barrel is the wrong image for visualizing a hash. The magic of hashes is that you don't do hash searches, you do hash lookups. When you ask for `$hash{foo}`, Perl doesn't have to examine every key in `%hash` to see whether `foo` is among them, instead, it calculates "If a key `foo` exists, then it will be in this location.", then looks only in that location. For a better real-world image, think of a hotel with mailboxes on the wall behind the front desk. When you want to see whether you have any mail, you don't tell the clerk your name and then wait for them to check every piece of mail to see who it's addressed to. Instead you say "I'm in room 234" and the clerk looks only at the box numbered 234. If there's anything in that box, you have mail; if there isn't, you don't. (In this example, you stating your room number is analogous to the hashing function used by Perl to map hash keys to "buckets" in the hash.)	[reply] [d/l] [select]
Re^2: Hash tables, are they really what we see? by heatblazer (Scribe) on Oct 03, 2012 at 08:56 UTC
In abstraction you mean let`s say it prereserves a cells in memory that are expected to be either full or empty so when I tell him, "perl I want a key name1, which I don`t remember if I sotred.", he looks into some 0xFF00BB3012, which expects to be in the memory for the hash and tells me: "nope, I don`t have what you want in that address location so I won`t give you the coresponding value from the coresponding address." Quite abstract stuff, but as far as I got it, perl kind of flags these addresses and then directly access them with lowlevel operations which are too much for me to bear...	[reply]
Re^2: Hash tables, are they really what we see? by heatblazer (Scribe) on Oct 03, 2012 at 09:09 UTC
Also your answer, enlightened me why hashes are unsorted. They are actually sorted by memory addresses for perl, and we see them as unsorted the way we expected them. ... if my logic is true.	[reply]
Re: Hash tables, are they really what we see? by remiah (Hermit) on Oct 03, 2012 at 05:19 UTC
Hello. I did some benchmark for perl hash vs in-memory SQLite several days ago. Number of records that you are interested seems similar with my case, have a look at this thread if you are interested in. Hash lookup is fast. There are cases in-memory sqlite is better than hash.	[reply]
Re^2: Hash tables, are they really what we see? by heatblazer (Scribe) on Oct 03, 2012 at 06:27 UTC
Thank you for that nice reference. It was interesting to read your benchmark. If I got to a huge project like that one, I`ll most definitely ask for wisdom.	[reply]
Re^3: Hash tables, are they really what we see? by AnomalousMonk (Archbishop) on Oct 03, 2012 at 07:13 UTC
Wisdom is sometimes useful even if one is not involved in a huge project.	[reply]
Re: Hash tables, are they really what we see? by AnomalousMonk (Archbishop) on Oct 03, 2012 at 07:35 UTC
Recently I read a post about hash tables. It is very often helpful to your fellow monks if you can provide a link to a post or on-line article that you reference. Please see What shortcuts can I use for linking to other information?.	[reply]
Re^2: Hash tables, are they really what we see? by heatblazer (Scribe) on Oct 03, 2012 at 08:20 UTC
Note taken. I`ll add the thread as edit. Look at my topic again.	[reply]
Re: Hash tables, are they really what we see? by locked_user sundialsvc4 (Abbot) on Oct 03, 2012 at 13:45 UTC
The analogy that I use for hashes is that of a set of post-office boxes ... which, for the purposes of the analogy, might be shared among a number of different people. Based on what the envelope of the incoming piece of mail says, it will always be placed into only one box. When someone comes to ask for their mail, you’ll look only in that box, but you still might have to look through the contents of that one box to find what you’re looking for. Some database systems have been built which do provide “hash” indexes. They are fast and efficient, provided that both the data-distribution and the hash function h() are such that there are not an excessive number of collisions ... where too-much mail winds up in the same box. Hash structures are low-maintenance, but they are also random-access-only. You can’t iterate through values naturally in key-order. (Although a hash-variable that is `tie`d to a Berkeley-DB file will yield its contents in key-order to `each()`.) Hashes do not require the rebalancing steps that are required by trees. They are so useful, and therfore so often used, that every major language now has a rock-solid high performance implementation of them.
Re^2: Hash tables, are they really what we see? by QM (Parson) on Oct 04, 2012 at 09:24 UTC
Hashes do not require the rebalancing steps that are required by trees. Well, yes they do. Anything that grows dynamically, without knowing the whole list ahead of time, may need to be rebalanced. But Perl does that for you. -QM -- Quantum Mechanics: The dreams stuff is made of	[reply]