First, it helps to strip the noise from the company names, such as Inc, Co, Corp, GMBH, LTD, etc.
String::Approx finds a distance by looking at insertions, deletions, and substitutions needed to transform one string to another.
A different approach, which worked better for me, was to make lists of all the substrings of length n in the source string. I called these n-tuples. I compared the percentage overlap between the n-tuple sets for each name in one list to the n-tuples for each word in the other list. The best value for the length n of the tuples was three or four.
Very close matches could be completely automated this way. For matches that were not so close, I finished the matching task manually. I made a web user interface that had a selection list of the match candidates ranked by the closeness of the match. The closeness was determined by the percentage of n-tuples that matched. I selected the best match for each entry on the amongst these top-ranked match candidates.
In reply to Approximate matching of company names
by toma
in thread Some kind of fuzzy logic.
by the_0ne
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |