in reply to Re^7: 64-bit digest algorithms
in thread 64-bit digest algorithms

I'm convinced by the arguments put forward by the apparent current experts in the field. I find the fact that (for example), Bob Jenkins is submitting his entry Maraca to the NIST SHA-3 hash competition a convincing argument for his expertise. And I find Bret Mulvay's descriptions of hash function testing strategies clear and intuative; and that it is referenced by many other apparent experts pursuasive.

I'm convince that as the input domain is infinite, that brute force testing is impossible. And that any testing performed by sampling the input domain is inadequate.

It is inadequate because it only tests for the fairness of the distributions of the outputs within a single, fixed N-bit output domain, but for general purpose hashing it must produce fair distributions in all table size of 2^n where n ranges from some small value, say 3, upto and including N. I'm convinced that as in general purpose hashing, hash buckets are allocated and reallocated dynamically as the number of keys grows, that there is an inherent and unavoidable need to be able to rely upon binary truncation of the output domain to fairly map the same N-bit hash value to different 2^(n<=N) table sizes.

And I'm pursuaded by the arguments (the avalanche and funnelling effects), that the only way that this will happen, is if each bit of the input affects (on average) 50% of the bits in the N-bit sized output. I find the arguments for these effects pursuasive and the tests for them understandable. And as the output domain is finite and routinely subdividable, I find the argument for testing strategies that test the effects of bit-wise changes to the inputs, on the outputs, a more sensible and statistically valid approach.

Bottom line: I'm convinced on the basis of the above, and much else, that CRC32 which was never designed as a hashing function, is a poor choice when there are other functions that have been designed--and tested--specifically for this purpose. And I am convinced, on the basis of the affects on the output distribution frequencies of bits toggled, that XOR-folding as a mechanism for truncation of N-bit hash values is a bad practice.

As such, I will not be using either for my hash function purposes.

Good luck with your research.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

Replies are listed 'Best First'.
Re^9: 64-bit digest algorithms
by gone2015 (Deacon) on Nov 22, 2008 at 13:47 UTC

    OK, an n-bit hash function where some subset can be used to provide 3..n-bit hashes, requires strong properties of "spreading the influence of the input"... otherwise the subsets won't work well.

    If that is your requirement, then CRC is indeed not for you.

    Testing the effectiveness of a hash is hard work, and as you say, limited by the ability to select test data. So I can see the attraction of testing some properties of the hash instead. Hopefully those properties are strongly correlated with effectiveness :-) (I suppose some confidence testing is required...)

    Thanks for the references. I will now "go away" and study :-)