in reply to Fuzzy matching of text strings

I had a similar problem, which made me write String::Compare... For some reason (that obviously I don't remember) I didn't use String::Approx...
daniel

Replies are listed 'Best First'.
Re^2: Fuzzy matching of text strings
by srdst13 (Pilgrim) on Dec 14, 2005 at 18:11 UTC
    Thanks all for the answers. Finding the similarity between two strings is one (probably the largest) component of my problem. However, there is another component--finding groups of "matches". I guess that I could do all possible pairs and look for similarity between them, forming a graph-like structure connecting "matches" to each other and then look for disconnected components or some such thing. Any thoughts on this second part of the problem? There are any number of possible ways to do it in practice (Graph.pm or even SQL could probably handle it), but it would be great to hear thoughts on the issue.

    Thanks again,
    Sean
      In fact, the process of developing each of the test subroutines was based on the results of the comparision using a subset of the data. What I did, in that case, was continuosly creating new tests and outputting to a csv file A, B and the comparision score. I stopped when I got a good result of both a limit score and having few false positives and false negatives. I think you could do it in the same way, no need for anything much sofisticated, just a subset of the database and many runs improving the type of tests you make.
      daniel