in reply to About the expected number of comparisons for n keys into k buckets
There is a lot of tuning and just plain raw running a lot of what are supposed to be realistic test cases involved in picking a hash algorithm.
To the best of my knowledge, the Perl hash algorithm is:
As it turns out, this is a very efficient algorithm with Intel processors to calculate (which is definitely part of any quality benchmark). If you try to replace that integer multiply with some shifts, the performance will just dive into the dirt. That's because on a modern Intel processor an integer multiply is essentially the same as an integer add. I was stunned when I found that out.unsigned int hash = 0; while (*s) /* s is pointer to string */ hash = hash * 33 + *s++; return hash;
I have a C app where I changed the hash key generation algorithm from the Perl one to a different one based upon empirical test data. Hash algorithms appear to be very much an empirical art.
Anyway, there are a lot of variables.
I'm skeptical, a single Q, Quality factor can be derived for a particular algorithm independent of a bunch of assumptions about the input data.
Update: From what I remember, the Perl hash algorithm will double the number of hash indicies when the total number of things in the hash exceeds total number of "starting slots/2". There is an errata to "Advanced Perl Programming, page 340 that points this out. This "when do we get bigger?" is also very important!
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: About the expected number of comparisons for n keys into k buckets
by JavaFan (Canon) on Jul 28, 2011 at 09:24 UTC | |
|
Re^2: About the expected number of comparisons for n keys into k buckets
by ikegami (Patriarch) on Jul 28, 2011 at 04:33 UTC | |
by Marshall (Canon) on Jul 28, 2011 at 05:10 UTC | |
by PerlOnTheWay (Monk) on Jul 29, 2011 at 09:01 UTC | |
by ikegami (Patriarch) on Jul 29, 2011 at 15:11 UTC | |
|
Re^2: About the expected number of comparisons for n keys into k buckets
by PerlOnTheWay (Monk) on Jul 28, 2011 at 04:19 UTC |