in reply to Fuzzy text matching... again

A couple of obvious stop words should be deleted from the strings before comparing. I'd also play with comparing and rearranging parts of the strings without suffering from the explosion of possible combinations. Giving additonal words a low score could also help.

If your final goal is a data base cleanup, I'd see all algorithms only as a help for the human editor. Presenting him a structured text file of potential matches is all I'd do. With proper format the editor can cut and paste to reassign the remaining errors.