Thanks all for the answers. Finding the similarity between two strings is one (probably the largest) component of my problem. However, there is another component--finding groups of "matches". I guess that I could do all possible pairs and look for similarity between them, forming a graph-like structure connecting "matches" to each other and then look for disconnected components or some such thing. Any thoughts on this second part of the problem? There are any number of possible ways to do it in practice (Graph.pm or even SQL could probably handle it), but it would be great to hear thoughts on the issue.