i'm sorting through a bunch of data with an author field. sometimes an author could be represented as 'a name' and other times 'am name', things like that. my goal is to try and work out which authors are the same people.
"what matters to you is the number of operations required to turn one string into another" - yeah thats pretty much right but with a few exceptions.
'a name' is obviously different from 'b name'
but
'am name' could be 'a name'
the substitution method wouldnt work on its own here.
Good luck ... this is a tough nut to crack with any non-trivial set of data - the number of false positives is going to be high. Uncommon names will work well but the m. smiths of the world are not going to be happy campers.