With ~2e5, you'll have to do something beside the simple weighted sum (like a modulus) if you want to avoid the bell shape, which tends to be a thing with the sum of independant variables (with identical probability distribution - not your case with the weights - you'd actually get close to a gaussian distribution, a perfect bell)

The sum of multiples of unique primes is less likely to collide.
That's not a property of primes I know. The reason primes gives you better result is because their progression is close enough to an exponential one.

What did surprise me was the effect that shuffling the primes had on reducing the collision rates.
I actually believed this couldn't be true until I realized you are not working with fixed length input. If you were, you'd just have to apply the same permutation to the string as the one applied to the primes to obtain the same result:
a b c d 3 7 17 29
Is the same as:
b d a c 7 29 3 17
and so on. But with variable length input, the coefficients on the left are more likely to be used. So you obtain better result by having coefficients that are more distant in the front and the "less efficient" ones at the end. You can start by 1, and then add your coefficient by maximazing the ratio (the distance between the bits two characters will affect is log(K1)-log(K2), so an increasing function of (K1/K2)). Taking the powers of two up to 2^16 that could be: 2^0, 2^16, 2^8, 2^12, 2^4, 2^10, 2^6, 2^14, 2^2, 2^15, 2^1, 2^13, 2^3, 2^11, 2^5, 2^9, 2^7 The biggest ratio you can get with those numbers is of course 2^16. Then you take number that's halfway (in terms of ratio) between: 2^16. 2^12 and 2^4 both have a minimum ratio with one of the previous number of 2^4 (I put 2^12 first because it's the further away from 2^0, which is the coefficient with the highest probability to be used). Then 2^10, 2^6, 2^14 and 2^2. All the remaining order introduce two new 2^0 ratios.


In reply to Re^3: An optimal solution to finding weights for a weighted sum signature. by Eily
in thread An optimal solution to finding weights for a weighted sum signature. by BrowserUk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.