in reply to word similarity measure
At first glance words 1, 2. and 4 are 'similar' since they each have the same number of sub-components. A second glance reveals that words 1, 2, and 3 are 'similar' - they each contain '101'. And words 2 and 4 are 'similar', they are the only words that contain 148 and 131.
I suspect that once you have defined your terms, you will be able to write a function that takes two words and returns the degree of 'similarity' between them. Once you have all of the pair-wise ratings computed, sort() will let you rank the papers from most alike to least.
This sounds like the kind of problem a plagiarism detector is designed for.
----
I Go Back to Sleep, Now.
OGB
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^2: word similarity measure
by planetscape (Chancellor) on Feb 28, 2009 at 04:28 UTC |