why you don't use cyclic redundancy check algorithms for hashing purposes.
This surprised me. So I've been running a few tests. I've dug out my copy of Knuth, and refreshed my memory on how you calculate the probability of 'c' collisions if you throw 'm' balls at random into 'n' bins. Then I ran three tests.
Each test uses randomly constructed 8-12 byte strings, each byte being 0..255 -- the distribution of string lengths and byte values should be even, assuming my 64-bit random number generator is good (which I believe it is).
The tests were:
take 1M (2^20) strings and count the number of collisions when using (a) CRC-32 and (b) FNV (see below). This was done 500 times, and the resulting distribution of numbers of collisions plotted against the expected distribution. See: Test-1.
take 16,384 (2^14) strings and count the number of collisions when using (a) CRC-32 and (b) FNV, and XORing the MS and LS halves to give a 16-bit hash. This was done 1,000 times, and the resulting distribution of numbers of collisions plotted against the expected distribution. See: Test-2.
as (2) but using (a) the LS and (b) the MS halves of CRC-32 to give a 16-bit hash. This was done 1,000 times. See: Test-3.
Visual inspection of these plots does not suggest that CRC-32 is clearly distinguishable from uniform randomness. The middle part of the FNV distributions is a little taller, suggesting it is a little less variable in it's results. The full 32-bit CRC-32 hash leans a little towards a few more collisions than expected, a little, perhaps...
I'm not sure whether to apply chi-squared or Kolmogorov-Smirnov to the data I have... but the plots don't suggest to me that CRC-32 is a bad hash.
If anyone can suggest a better test or a better statistical approach, or has contrary results, I'd love to hear !
The FNV hash is taken from http://bretm.home.comcast.net/~bretm/hash/6.html. What it comes down to is:
U32 FNV_Hash(U8 * p, int l) { U32 hash = 2166136261 ; while (l--) { hash = (hash * 16777619) ^ *p++ ; } ; return hash ; } ;
In reply to Re^5: 64-bit digest algorithms
by gone2015
in thread 64-bit digest algorithms
by BrowserUk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |