in reply to Tuning an approximate match to prefer closer lengths

The Levenshtein edit-distance algorithm (not sure if that's what String::Approx uses) can be parametrized, so you can give different penalties for character insertions, deletions, and discrepancies (instead of all penalties equal to 1 as in its usual usage). In fact, it looks like someone has already done this with Text::WagnerFischer.

Just give insertions & deletions a (much) higher penalty than character discrepancies. The modified Levenshtein algorithm will give the optimal distance based on those penalties, and you can find the closest among a list of candidates.

blokhead

  • Comment on Re: Tuning an approximate match to prefer closer lengths