Ignoring subtleties about how you may have developed your keyword->index mapping, the easiest way to measure the similarity would be to generate a hash with your word identifiers as keys and then brute force a similarity array. Something like:
@counts = ();
for $i_word (1 .. $#words) {
for $j_word (0 .. $i_word-1) {
$count[$i_word][$j_word] = 0;
foreach (keys %{$paper{$i_word}}) {
if (exists $paper{$j_word}{$_} {
$count[$i_word][$j_word]++;
}
}
}
}
If you aren't familiar with lists of lists, take a gander at perllol.