XP is just a number | |
PerlMonks |
comment on |
( [id://3333]=superdoc: print w/replies, xml ) | Need Help?? |
You need to define what you mean by "similarity".
At first glance words 1, 2. and 4 are 'similar' since they each have the same number of sub-components. A second glance reveals that words 1, 2, and 3 are 'similar' - they each contain '101'. And words 2 and 4 are 'similar', they are the only words that contain 148 and 131. I suspect that once you have defined your terms, you will be able to write a function that takes two words and returns the degree of 'similarity' between them. Once you have all of the pair-wise ratings computed, sort() will let you rank the papers from most alike to least. This sounds like the kind of problem a plagiarism detector is designed for.
---- OGB In reply to Re: word similarity measure
by Old_Gray_Bear
|
|