comment on

I hadn't been able to figure out why the shuffling helped you, either. Before Eily's post, I was going to lean down the road of suggesting comparing your monotonically increasing ramp of weights with a monotonically decreasing ramp of weights, and see whether collisions were affected. And then recommend trying a list of weights which was half ramp + half random (actually, three sets of half-ramp: #1:first half increasing, #2:middle half increasing (quarter random on each side), #3: final half increasing. (And compare all those to one of your fully-shuffled weight setes.) If you saw that the down-ramp and the the half-ramps all had worse collisions than your shuffle, then I would suggest trying a mutator that tried to get rid of rampy-segments: if you found a sequence of weights that were strictly increasing (or generally increasing, with occasional excursions), I would suggest having the mutator pick new random primes or just re-shuffle the rampy section.

Also, I have found that histograms can hide some critical information. With your ordered set of 736000 combinations from the first example, you might want to comapre the time-series of the generated signatures -- probably with the multiple weight lists: ramp up, ramp down, 3 half ramps, and one or two completely random weights, to see if any patterns jump out and give you a hint for what your mutator could do.

Finally, as a last thought: as Eily said, sums of primes don't approach uniqueness. However, products of primes do. I know that purely multiplying primes is unique. I spent some time while I couldn't sleep last night trying to come up with a reasonable way to use that, since you'd obviously quickly overrun your 64bit integer with a product of K=1000 primes. :-) I think I came up with something (after looking up a few primey facts today) that would be a product of two primes.

To fit in 64bits, each prime would have to be less than 2**32. The biggest 32bit prime number ⁽¹⁾ is 4_294_967_291, which is the 203_280_221^{st (2)} prime.
That number of multiplyable-primes is bigger than your expected N≥200_000, so in theory, we could assign two primes (or 1000) to each of your N things, and multiple two primes together.
If you didn't want to just start doling out primes randomly, you could do something like
- @prime[1 .. 200_000] = import_from_list_of_first_million_primes()
- generate two 1000-weight lists @wa and @wb, using your randomly shuffled primes
- for each $comb, generate two sigs $sa and $sb in your sig function
  - even if one of your sigs had a collsion, the chances that both collide at the same time are significanlty smaller
  - return $product = $prime[$sa] * $prime[$sb]

In reply to Re^3: An optimal solution to finding weights for a weighted sum signature. by pryrt
in thread An optimal solution to finding weights for a weighted sum signature. by BrowserUk

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.