Let's say a selection is a particular outfit. Each category (hat, top, bottom, shoes, etc) has a fixed number of possible choices to choose from. Comparing the selections of two people, the distance will be the number of category items that they chose differently. If they picked the same hat, shoes and pants but a different top then the distance would be 1.
Obviously the maximum distance then is the number of categories. In my case, more than 5 but less than 10. The number of selections will also be fairly small - no more than a dozen. My current approach is to determine the distance for every selection using a high water mark algorithm. Unfortunately, the only opportunity for short circuiting is if two selections have the greatest distance possible.
I want to be clear - there is absolutely nothing wrong with my current solution. I am only seeking wisdom for wisdom's sake. I understand that there can be no better solution in the worst case than what I have already come up with. In my actual data though, most selections will not differ from others by a great amount.
One idea I had was to convert the selection to numbers. Each time a new high water mark is found, only selections that are at least that number away are considered. I haven't worked out a system yet for this.
What ideas do you have?
Cheers - L~R
In reply to Help thinking about an alternate algorithm by Limbic~Region
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |