Re: compare strings intuitively

Nice replies so far. I would add that any single score returned by String::Compare is probably not intended to be "intuitively" meaningful in itself; instead, the intent seems to be that when looking for "fuzzy matches" among a set of strings, the relative scores might serve as a means for ranking them in terms of their degree of similarity.

For example, if you had a $str3 set to "don dispeigne", comparing this one to the other two would yield scores that were "worse" (lower, meaning "less similar") than the 0.39871 that you got from comparing your first two strings.

I think the issue depends on what you intend to do with the "similarity score" (or "difference score") once you have it: what is your purpose in assigning a numeric value to the difference between two strings?

Having looked at the source code for String::Compare, it's hard to say how its scoring would compare to that of Text::Levenshtein (or to any sort of arithmetic result you might be able to derive from using Algorithm::Diff, which is another fine and powerful module).

I like String::Compare's notion of combining results from a variety of fairly simple, linguistically-motivated tests (score just the consonants, then just the vowels, then all letters, then all characters, then weighted in some way according to "word count", etc). But the implementation in that module seems awfully simple... I've used Algorithm::Diff and I've read its manual; because of that, I'm inclined to think that the problem is not as simple as String::Compare's methods would suggest.

Comment on Re: compare strings intuitively

Replies are listed 'Best First'.
Re^2: compare strings intuitively by bart (Canon) on May 07, 2007 at 12:05 UTC
[...] you might be able to derive from using Algorithm::Diff A personal remark from me about anything based on: diff: it performs very badly (or "unintuitively" ;-)) when two adjectant substrings are swapped, which is a very common typo, at least for me, I do it all the time — usually with single letters.	[reply]