in reply to Levenstein distance transcription

It's a fairly popular algorhitm - you can find the implementation in Perl both on Rosetta code and on Wikibooks. For your convenience, I'm posting the copied Wikibooks version below:

use List::Util qw(min); sub levenshtein { my ($str1, $str2) = @_; my @ar1 = split //, $str1; my @ar2 = split //, $str2; my @dist; $dist[$_][0] = $_ foreach (0 .. @ar1); $dist[0][$_] = $_ foreach (0 .. @ar2); foreach my $i (1 .. @ar1){ foreach my $j (1 .. @ar2){ my $cost = $ar1[$i - 1] eq $ar2[$j - 1] ? 0 : 1; $dist[$i][$j] = min( $dist[$i - 1][$j] + 1, $dist[$i][$j - 1] + 1, $dist[$i - 1][$j - 1] + $cost ); } } return $dist[@ar1][@ar2]; }

Update: I'm sorry, I missed the "based on words" part of your question. Thankfully, Eily already answered your question, reading it more carefully than I did.

- Luke

Replies are listed 'Best First'.
Re^2: Levenstein distance transcription
by Eily (Monsignor) on Dec 05, 2014 at 13:48 UTC

    This one is fairly easy to turn into a word based equivalent, you just have to replace split // with split /\W+/. But it just returns a distance, not the description of the differences as Text::EditTranscript does though.

Re^2: Levenstein distance transcription
by wollmers (Scribe) on Dec 05, 2014 at 22:41 UTC

    You shouldn't trust Rosetta code, or Wikipedia, or Wikibooks, nor pseudocode in algorithmic papers (even if peer reviewed), unless you tested it for some level of "complete".

    In my collection of implementations (C and Perl) and algorithm descriptions in the field around LCS/Align/ASM only 10 % are usable in the sense of reliable.