in reply to Re^2: [OT] The statistics of hashing.
in thread [OT] The statistics of hashing.
Thanks syphilis. Your calculations make sense to me. But I'm not sure that it gels with the actual data?
Assuming I've coded your formula correctly (maybe not!), then using 10 hashes & vectors, I get the odds of having seen a dup after 1e9 inserts as (1 - ((4294967295/4294967296)**1e9) ) **10 := 0.00000014949378123.
By that point I had actually seen 13 collisions:
And looking at the figure for 4e9 := 0.00667569553892502, by which time the 10 vectors will be almost fully populated, it looks way too low to me?
I would have expected that calculation (for N=4e9) to have yielded odds of almost 1?
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^4: [OT] The statistics of hashing.
by syphilis (Archbishop) on Apr 01, 2012 at 19:33 UTC | |
by BrowserUk (Patriarch) on Apr 01, 2012 at 20:06 UTC | |
by syphilis (Archbishop) on Apr 02, 2012 at 04:02 UTC |