I'm convinced by the arguments put forward by the apparent current experts in the field. I find the fact that (for example), Bob Jenkins is submitting his entry Maraca to the NIST SHA-3 hash competition a convincing argument for his expertise. And I find Bret Mulvay's descriptions of hash function testing strategies clear and intuative; and that it is referenced by many other apparent experts pursuasive.

I'm convince that as the input domain is infinite, that brute force testing is impossible. And that any testing performed by sampling the input domain is inadequate.

It is inadequate because it only tests for the fairness of the distributions of the outputs within a single, fixed N-bit output domain, but for general purpose hashing it must produce fair distributions in all table size of 2^n where n ranges from some small value, say 3, upto and including N. I'm convinced that as in general purpose hashing, hash buckets are allocated and reallocated dynamically as the number of keys grows, that there is an inherent and unavoidable need to be able to rely upon binary truncation of the output domain to fairly map the same N-bit hash value to different 2^(n<=N) table sizes.

And I'm pursuaded by the arguments (the avalanche and funnelling effects), that the only way that this will happen, is if each bit of the input affects (on average) 50% of the bits in the N-bit sized output. I find the arguments for these effects pursuasive and the tests for them understandable. And as the output domain is finite and routinely subdividable, I find the argument for testing strategies that test the effects of bit-wise changes to the inputs, on the outputs, a more sensible and statistically valid approach.

Bottom line: I'm convinced on the basis of the above, and much else, that CRC32 which was never designed as a hashing function, is a poor choice when there are other functions that have been designed--and tested--specifically for this purpose. And I am convinced, on the basis of the affects on the output distribution frequencies of bits toggled, that XOR-folding as a mechanism for truncation of N-bit hash values is a bad practice.

As such, I will not be using either for my hash function purposes.

Good luck with your research.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

In reply to Re^8: 64-bit digest algorithms by BrowserUk
in thread 64-bit digest algorithms by BrowserUk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.