in reply to Re^2: Fuzzy matching of text strings
in thread Fuzzy matching of text strings

In fact, the process of developing each of the test subroutines was based on the results of the comparision using a subset of the data. What I did, in that case, was continuosly creating new tests and outputting to a csv file A, B and the comparision score. I stopped when I got a good result of both a limit score and having few false positives and false negatives. I think you could do it in the same way, no need for anything much sofisticated, just a subset of the database and many runs improving the type of tests you make.
daniel