karey3341 has asked for the wisdom of the Perl Monks concerning the following question:

If my data looks like this:

word 1: 100 101 101 102 102 102 106 106

word 2: 101 104 106 110 113 129 131 148

word 3: 101 153 175 180 381

word 4: 106 110 113 122 131 137 142 148

word 5: 120 165 169

Where word 1,2,3,4,5 represent different words, numbers represent a list of paper those words have been used as keywords.

How can I calculate similarity between these words?