Ignoring subtleties about how you may have developed your keyword->index mapping, the easiest way to measure the similarity would be to generate a hash with your word identifiers as keys and then brute force a similarity array. Something like:
@counts = (); for $i_word (1 .. $#words) { for $j_word (0 .. $i_word-1) { $count[$i_word][$j_word] = 0; foreach (keys %{$paper{$i_word}}) { if (exists $paper{$j_word}{$_} { $count[$i_word][$j_word]++; } } } }
If you aren't familiar with lists of lists, take a gander at perllol.
In reply to Re: word similarity measure
by kennethk
in thread word similarity measure
by karey3341
For: | Use: | ||
& | & | ||
< | < | ||
> | > | ||
[ | [ | ||
] | ] |