in reply to Theory time: Sentence equivalence
I LOVE bread and butter.
LOVE is beautiful.
Love in one sentence is a verb and in the other a subject. The two should be kept separate.
As your sample grows, you should be able to get fairly accurate matches by adding up the weights for each word/part of speech x the number of times the word appears in the sentence. You only need to look at sentences containing key words and a match percentage over a certain level, which means that your heavy-duty algorithm will probably never need to do more than a few dozen sentences even with hundreds of thousands of sentences.
|
|---|