in reply to Closest matches from string array

Soundex is a good keyword to search for and there is Text::Soundex to make life easy. You would want to add a soundex field to your DB, index it then soundex the search term. Fuzzy matching on names is something it does pretty well (for English). You may also find Re: Duplicate detection (SQL) and the associated threads like Module for comparing text and Closest match Display of interest for a number of other ways to do approximate matching.

cheers

tachyon

s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

Replies are listed 'Best First'.
Re: Re: Closest matches from string array
by tbone1 (Monsignor) on Oct 27, 2003 at 12:41 UTC

    I'll back up what tachyon said about Text::Soundex, with one caveat: it depends on your data. I looked into this for something at work, but as we are a medical company, some of the terminology is ... how do I put this ... not phonetically straightforward. There are a lot of Greek and Latin chunks in our terms, sometimes Greek and Latin and English components are intermingled in the same word. And English is still more or less a Germanic language. Sorta.

    If you are dealing with last names, as your example implies, those can vary even more widely. I would suggest that you familiarize yourself with the data, if possible, before you make a decision, but 1) we all know to do that whatever the project, and 2) it isn't always possible.

    --
    tbone1
    Ain't enough 'O's in 'stoopid' to describe that guy.
    - Dave "the King" Wilson

      Coming from a medical background I know just what you mean. Not only that mish mash of Greek and Latin but nowadays the drug company marketing guys just make stuff up cause it sounds good. Loperamide, Ondansetron, Vasocardil, Isocover, Vancomycin, Tofranil, Xanax, Celebrex and who could have missed Viagra (good for spam at least it seems :-)

      cheers

      tachyon

      s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print