If you do take the hash approach, you might want to consider letter frequency. Since the letter 's' is more frequent than the letter 'q', does that mean that 'said' is a closer match to 'laid' than 'qaid'? (yes, that's a word)
Also, do you know that with String::Approx that you can adjust the number of "edits"? For example, for a word with only two characters of difference, you can specify:
You could set the number of edits to 1 and if that doesn't return a list to examine, just keep increasing the number of edits until you get something.my @catches = amatch("plugh", ['2'], @inputs);
You may also want to check out Text::Soundex which will encode words into four character strings that represents what they "sound" like. Then, you can compare the shorter strings. I don't know how reliable this is and it's only for the English language.
A final option to consider is Text::Metaphone, which does phonetic encoding of words. You could then check to see if words sound the same (yeah, I know, this is a longshot). I do not know if this is for languages other than English.
Since you have a "fuzzy" problem, there is going to be no simple solution to this problem and you will have quite a time working with this, I'm sure. However, it might make a nifty module for CPAN, when finished.
Cheers,
Ovid
In reply to (Ovid) Re: Fuzzy Strings
by Ovid
in thread Fuzzy Strings
by orthanc
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |