in reply to Comparing sets of phrases stored in a database?
The first thing you need to do is define how you, as a human being, would judge the similarity of the sets.
For example, you start with a set (A), and you make an exact copy (B). You will (presumably) judge these as very similar.
If the original set contains 100 phrases, and you remove phrases 1 at a time from the duplicate, does the similarity drop linearly?
Is ordering of the phrase words important.
Do the phrases need to be exactly the same, to be counted similar.
Are looking for semantic similarity.
Can typos occur? Is it possible for you to correct them?
Are the sets ordered or unordered.
Semantics again.
Once you've decided how you would make the judgement, then you stand some chance of being able to lay out a set of rules. And once you have that, you can start to look for a good way to implement them.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Comparing sets of phrases stored in a database?
by BUU (Prior) on Sep 30, 2012 at 20:24 UTC | |
by BrowserUk (Patriarch) on Sep 30, 2012 at 21:18 UTC | |
by BUU (Prior) on Sep 30, 2012 at 21:41 UTC | |
by BrowserUk (Patriarch) on Sep 30, 2012 at 23:34 UTC | |
by remiah (Hermit) on Sep 30, 2012 at 21:12 UTC |