I think this is dangerous territory you are venturing into. Updating a data-file on basis of another list of data which "sound" the same is a very loose definition to go by. Can you tighten the rules on when to consider a data record to be the same as an existing entry? Otherwise I'm afraid not even Perl will do a good job here.
The venerable Soundex algorithm is known to map wildly different words to the same "stem" and that may or may not be what you want.
CountZero
"If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law