Interesting. I think there is a fundamental problem here arising from "For a random hash of "<n"> keys into "<k"> buckets,..". Presuming we can even define what random input means to the hash algorithm, the hash function will not generate random indicies.

There is a lot of tuning and just plain raw running a lot of what are supposed to be realistic test cases involved in picking a hash algorithm.

To the best of my knowledge, the Perl hash algorithm is:

unsigned int hash = 0; while (*s) /* s is pointer to string */ hash = hash * 33 + *s++; return hash;
As it turns out, this is a very efficient algorithm with Intel processors to calculate (which is definitely part of any quality benchmark). If you try to replace that integer multiply with some shifts, the performance will just dive into the dirt. That's because on a modern Intel processor an integer multiply is essentially the same as an integer add. I was stunned when I found that out.

I have a C app where I changed the hash key generation algorithm from the Perl one to a different one based upon empirical test data. Hash algorithms appear to be very much an empirical art. Anyway, there are a lot of variables.
I'm skeptical, a single Q, Quality factor can be derived for a particular algorithm independent of a bunch of assumptions about the input data.

Update: From what I remember, the Perl hash algorithm will double the number of hash indicies when the total number of things in the hash exceeds total number of "starting slots/2". There is an errata to "Advanced Perl Programming, page 340 that points this out. This "when do we get bigger?" is also very important!


In reply to Re: About the expected number of comparisons for n keys into k buckets by Marshall
in thread About the expected number of comparisons for n keys into k buckets by PerlOnTheWay

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.