Re: Finding Nearly Identical Sets

IIRC these structures are called multisets because some elements are repeated in one of your examples.

If I understand your requirements correctly, you can use your approach in a pragmatic way, because any "neighboring" multi sets must have at least 8 digits in common.

Sort the 9 elements of the original multi set
Calculate all distinct 8 element subsets (at most 9)
Create hash keys by joining them
push the original set into a HoA ( or HoH) for each subset.

At the end you'll only need 9 hash look ups to drastically narrow down potential candidates.

NB: That's a pragmatic approach, a detailed survey might show more efficient algorithms.

HTH :)

PS: this problem reminds me of hamming distance of error correcting codes, but I doubt you can easily apply this here.

Cheers Rolf
_{(addicted to the Perl Programming Language and ☆☆☆☆ :)

Je suis Charlie!}

update

I just realized that you already sketched that approach in Re^2: Finding Nearly Identical Sets . Not sure why you say it's ugly, cause a HoH should be quite fast, and you'd need to check anyway, if your input is equidistant to multiple neighbors.

Comment on Re: Finding Nearly Identical Sets

Replies are listed 'Best First'.
Re^2: Finding Nearly Identical Sets by Limbic~Region (Chancellor) on Sep 29, 2016 at 11:54 UTC
LanX, Thanks! I was actually looking at Hilberts Curve. As far as the 9 subset/hash, I outlined that solution yesterday in a response to BrowserUk. I was hoping for something more elegant which is why I asked here. Cheers - L~R	[reply]
Re^3: Finding Nearly Identical Sets by LanX (Saint) on Sep 29, 2016 at 20:53 UTC
Again, for me it is elegant. It's fast, easy to implement and its matching the requirements. Well unless you have more in mind than you told us. Better solutions (like a more straight away normalization) would require your codepoints to have a guaranteed minimal distance. This could certainly be achieved, but well regarding the generality of your question, it'll be far fetched to suppose such a structure. Cheers Rolf _{(addicted to the Perl Programming Language and ☆☆☆☆ :) Je suis Charlie!}	[reply]