porl has asked for the wisdom of the Perl Monks concerning the following question:

Hi! I am new to Perl and very stuck. I am trying to create a Perl implementation of the LCS algorithm en.wikipedia.org/wiki/Longest_common_subsequence_problem
I am required to do a comparison of two text files without the use of Text::Diff and I THINK(suggestions welcomed) this is how I should be go about it. I am having trouble converting the pseudo code functions found in the link. This is what I have so far but have no idea if I am on the right track.

sub wikiLCSLength { #$file1 = $_[0]; #$file2 = $_[1]; #$file1 = "i b c d e f g h i"; #$file2 = "a b c d e f f f f"; @m = ("a", "b", "c", "d", "e"); @n = ("a", "b", "c", "e", "e"); $mLength = scalar @m; $nLength = scalar @n; #Initialize the multidimensional array for(my $i = 0; $i<= $mLength; $i++) { for(my $j = 0; $j<= $nLength; $j++) { $C[$i][$j] = 0; } } for($i = 0; $i <= $mLength; $i++) { $C[$i][0] = 0; } for($j = 0; $j <= $nLength; $j++) { $C[0][$j] = 0; } for($i=1; $i<$mLength; $i++) { for($j=1; $j<$nLength; $j++) { if($m[$i] eq $n[$j]) { $C[$i][$j] = $C[$i-1][$j-1] + 1; } else { $C[$i][$j] = max(($C[$i][$j-1]),($C[$i-1][$j])); } } } &wikiBacktrack(\@C, \@m, \@n, $mLength, $nLength); } sub wikiBacktrack { @C = @{$_[0]}; @m = @{$_[1]}; @n = @{$_[2]}; $mLength = $_[3]; $nLength = $_[4]; print("\n $n[5] \n"); #BACKTRACKIN BB if($mLength==0 || $nLength==0) { return (""); } elsif($m[$mLength] eq $n[$nLength]) { return &wikiBacktrack(@C, @m, @n, $mLength-1, $nLength-1) + $m +[$mLength]; } else { if($C[$mLength][$nLength-1] > $C[$mLength-1][$nLength]) { return &wikiBacktrack(@C, @m, @n, $mLength, $nLength-1); } else { return &wikiBacktrack(@C, @m, @n, $mLength-1, $nLength); } } }

Any help would be greatly appreciated. Thanks in advance!

Replies are listed 'Best First'.
Re: LCS algorithm
by roboticus (Chancellor) on Apr 16, 2015 at 11:06 UTC

    porl:

    The first thing I notice is that you're first initializing your @C array to 0, then you're initializing the first row and column to 0. Since they're already zero, you don't need that. So I'd remove those two loops.

    Next, it looks like you're having trouble splitting up your strings. You can do it like so:

    my @list_of_characters = split //, 'a string';

    You're also using C-style loops, like so:

    for(my $i = 0; $i<=$mLength; $i++) { ... }

    You may find it easier to read if you'd switch over to a more perlish style, more like:

    for my $i (0 .. $mLength) { ... }

    You're also using '&' in a function call, which is something you ought to avoid except in certain cases. For general function calls, leave out the '&'.

    You've got some off-by-one errors in wikiBacktrack: if $mLength holds the length of a array @m, then $m[$mLength] doesn't exist. If you want to check the value of the last item in the list, perl has a shorthand: -1 refers to the last item, so you can use that instead. (It also offers the $#m variable, which is the index of the last item if you need that value).

    There are a few other items, but these should get you going.

    Update: Added code tags so an array reference isn't treated as a link.

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

      Thank-you so much for the direction! I am going to get right on it.
Re: LCS algorithm
by hdb (Monsignor) on Apr 16, 2015 at 10:39 UTC

    There is also an LCS implementation in Algorithm::Diff...

    UPDATE: After inspecting the documentation of Text::Diff it turns out that it relies on Algorithm::Diff for its LCS implementation (I did not know that).

Re: LCS algorithm
by Anonymous Monk on Apr 16, 2015 at 10:29 UTC
    Here is how you can know if you're on the right track :)
    my @wantedOutput = qw/ ro sham bo /; my @input = qw/ ro sham bo /; my @output = LCS( @input ); use Data::Dump qw/ dd /; dd( \@wantedOutput, \@output );