Ignoring subtleties about how you may have developed your keyword->index mapping, the easiest way to measure the similarity would be to generate a hash with your word identifiers as keys and then brute force a similarity array. Something like:
@counts = (); for $i_word (1 .. $#words) { for $j_word (0 .. $i_word-1) { $count[$i_word][$j_word] = 0; foreach (keys %{$paper{$i_word}}) { if (exists $paper{$j_word}{$_} { $count[$i_word][$j_word]++; } } } }
If you aren't familiar with lists of lists, take a gander at perllol.
In reply to Re: word similarity measure
by kennethk
in thread word similarity measure
by karey3341
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |