Re: Help thinking about an alternate algorithm

Essentially, you've describe a string comparison problem.
Where each 'letter' of the string is a category and thus each letter has a different range of possibilities, and the number of categories is the length of the string.
I need to determine the furthest distance two selections are from a list of possible selections.

That bit -- two selections from a list of possible selection -- is muddy. But if we ignore that and only consider this bit:
In other words, comparing every selection in the list against every other selection in the list.

What you are describing is a full cross-comparison using a variation on the 'edit distance'.
The bottom line for which I think is that because you are seeking a relative measure, there is no way to avoid a full cross comparison.
This is because the difference between the first letters is equally significant to those between the last; thus any attempt to sort the strings; or to reduce the strings to numerical values so that they may be ordered with a view to reducing the number of comparisons; fails.

Maybe if you encode each category as a set of bits, and combine those bits into bit strings, the individual comparisons reduces to a matter of xoring the pairs of strings and counting the number of bits in the result.

That should prove to be substantially faster than 5 to 10 individual subtractions and summing of the results.

But I think the best you can hope for is to improve the per-pair comparison cost; rather than there being any way to reduce the number of per-pair comparisons that must be done. (I think :)

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

Comment on Re: Help thinking about an alternate algorithm

Replies are listed 'Best First'.
Re^2: Help thinking about an alternate algorithm by Limbic~Region (Chancellor) on Jan 15, 2014 at 16:55 UTC
BrowserUk, I have already maximized the efficiency in determining the distance between two selections using bitmasks. As stated, I know that full cross comparison is necessary in some cases. I was just hoping that I could prune that full cross comparison in cases where I knew a compare couldn't lead to a higher distance than I already have. There is one trivial case where this short circuiting works (when 2 items are the maximum distance possible). I was hoping there might be others assuming the data will be mostly similar. Cheers - L~R	[reply]
Re^3: Help thinking about an alternate algorithm by BrowserUk (Patriarch) on Jan 15, 2014 at 17:02 UTC
There is one trivial case where this short circuiting works (when 2 items are the maximum distance possible). "two items"? Are items people? Or categories? What is the "maximum distance"? No categories the same. How can you know that until you have compared all categories for the given pair? What are you short circuiting? With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]