in reply to Re^2: An optimal solution to finding weights for a weighted sum signature.
in thread An optimal solution to finding weights for a weighted sum signature.
With ~2e5, you'll have to do something beside the simple weighted sum (like a modulus) if you want to avoid the bell shape, which tends to be a thing with the sum of independant variables (with identical probability distribution - not your case with the weights - you'd actually get close to a gaussian distribution, a perfect bell)
The sum of multiples of unique primes is less likely to collide.That's not a property of primes I know. The reason primes gives you better result is because their progression is close enough to an exponential one.
What did surprise me was the effect that shuffling the primes had on reducing the collision rates.I actually believed this couldn't be true until I realized you are not working with fixed length input. If you were, you'd just have to apply the same permutation to the string as the one applied to the primes to obtain the same result:
Is the same as:a b c d 3 7 17 29
and so on. But with variable length input, the coefficients on the left are more likely to be used. So you obtain better result by having coefficients that are more distant in the front and the "less efficient" ones at the end. You can start by 1, and then add your coefficient by maximazing the ratio (the distance between the bits two characters will affect is log(K1)-log(K2), so an increasing function of (K1/K2)). Taking the powers of two up to 2^16 that could be: 2^0, 2^16, 2^8, 2^12, 2^4, 2^10, 2^6, 2^14, 2^2, 2^15, 2^1, 2^13, 2^3, 2^11, 2^5, 2^9, 2^7 The biggest ratio you can get with those numbers is of course 2^16. Then you take number that's halfway (in terms of ratio) between: 2^16. 2^12 and 2^4 both have a minimum ratio with one of the previous number of 2^4 (I put 2^12 first because it's the further away from 2^0, which is the coefficient with the highest probability to be used). Then 2^10, 2^6, 2^14 and 2^2. All the remaining order introduce two new 2^0 ratios.b d a c 7 29 3 17
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^4: An optimal solution to finding weights for a weighted sum signature.
by BrowserUk (Patriarch) on Nov 06, 2016 at 00:04 UTC |