in reply to Re^6: 64-bit digest algorithms
in thread 64-bit digest algorithms

I'll read the rest again and try stuff in the morning... In the meantime:

And that brings me back to your collisions graphs. I think you got your math wrong. I simply cannot believe that you should expect 1890 collisions from 16384 randomly chosen samples from a domain of 2**32 possibilities. Birthday paradox or not, that is way, way too high a collision rate. Way too high.

The 1890 collisions is when mungeing the 32-bit hashes into 16-bit hashes -- so it's 16,364 random thingies in 65,536 bins... So roughly speaking:

  1. after 4,096 tosses ~1/16 of the bins are occupied...

  2. ...so the next 4,096 tosses ~4,096 * 1/16 = 256 collisions, and ~ 2/16 of the bins are occupied...

  3. ...so the next 4,096 tosses ~4,096 * 2/16 = 512 collisions, and ~ 3/16 of the bins are occupied...

  4. ...so the next 4,096 tosses ~4,096 * 3/16 = 768 collisions...

...total 1,536 -- accepting that this is an underestimate.