Re: Levenstein distance transcription

It's a fairly popular algorhitm - you can find the implementation in Perl both on Rosetta code and on Wikibooks. For your convenience, I'm posting the copied Wikibooks version below:

 
use List::Util qw(min);
 
sub levenshtein {
    my ($str1, $str2) = @_;
    my @ar1 = split //, $str1;
    my @ar2 = split //, $str2;
 
    my @dist;
    $dist[$_][0] = $_ foreach (0 .. @ar1);
    $dist[0][$_] = $_ foreach (0 .. @ar2);
 
    foreach my $i (1 .. @ar1){
        foreach my $j (1 .. @ar2){
            my $cost = $ar1[$i - 1] eq $ar2[$j - 1] ? 0 : 1;
            $dist[$i][$j] = min(
                        $dist[$i - 1][$j] + 1, 
                        $dist[$i][$j - 1] + 1, 
                        $dist[$i - 1][$j - 1] + $cost );
        }
    }
 
    return $dist[@ar1][@ar2];
}
[download]

Update: I'm sorry, I missed the "based on words" part of your question. Thankfully, Eily already answered your question, reading it more carefully than I did.

- Luke

Comment on Re: Levenstein distance transcription Download Code

Replies are listed 'Best First'.
Re^2: Levenstein distance transcription by Eily (Monsignor) on Dec 05, 2014 at 13:48 UTC
This one is fairly easy to turn into a word based equivalent, you just have to replace `split //` with `split /\W+/`. But it just returns a distance, not the description of the differences as Text::EditTranscript does though.	[reply] [d/l] [select]
Re^2: Levenstein distance transcription by wollmers (Scribe) on Dec 05, 2014 at 22:41 UTC
You shouldn't trust Rosetta code, or Wikipedia, or Wikibooks, nor pseudocode in algorithmic papers (even if peer reviewed), unless you tested it for some level of "complete". In my collection of implementations (C and Perl) and algorithm descriptions in the field around LCS/Align/ASM only 10 % are usable in the sense of reliable.	[reply]