in reply to Re: abbreviation checking
in thread abbreviation checking

While making an MP3-renaming script, which attacks a problem similar to yours, I used a combination of Metaphone and "distance" modules. My approach:
  1. Get a list of "known-good" words. I use already-verified MP3 filenames as a source of these.
  2. Calculate their Metaphones.
  3. Calculate the Metaphone of any new words and look for matches. If none, see if there are any matches with a distance of 1 or 2. Distances larger than 2 produce too many matches.
  4. Have the user confirm the 'corrections'.
It's not an exact science, and human intervention is unavoidable if correctness matters.