comment on

Maybe Algotithm::Diff will perform better. It isnt C but always seems to perform its job well and fast. If you are trying to do this on gene sequences you probably want to do it in C - once you have found an efficient algoritm. It does LCS native and can be easily made to do LCSS. Yes LCS and LCSS are different. The lcss method returns all the common substrings, unsorted. Just iterate through the list and remeber the index of the longest (much more efficient than a sort). You would want a Schwartzian transform on the sort by length if you want to use the multiple results (but if you want more than one LCSS its kinda a contradiction in terms).

NB: Edge case where two (L)CSS are same length second ignored in my simple loop.

use Algorithm::Diff qw(LCS traverse_sequences);

my @seq1 = split //, 'abcdefghijklmnopqrstuvwxyz';
my @seq2 = split //,'flubberabcdubberdofghijklm';
my $lcs    = join '', LCS( \@seq1, \@seq2 );
print "LCS: $lcs\n";

my $lcss = lcss( \@seq1, \@seq2 );
print "All CSS: @$lcss\n";
my $index;
my $length = 0;
for( my $i = 0; $i < @$lcss; $i++ ) {
   next unless length($lcss->[$i])>$length;
   $index = $i;
   $length = length($lcss->[$i]);
}
print "LCSS: ", $lcss->[$index], "\n";
# optional sort method
print "LCSS: ", (sort{length{$a}<=>length{$b}}@$lcss)[-1];

sub lcss {
    my ( $seq1, $seq2 ) = @_;
    my ( @match, $from_match );
    my $i = 0;
    traverse_sequences( $seq1, $seq2, {
        MATCH => sub { $match[$i] .= $seq1->[$_[0]]; $from_match = 1 }
+,
        DISCARD_A => sub { do{$i++; $from_match = 0} if $from_match },
        DISCARD_B => sub { do{$i++; $from_match = 0} if $from_match },
    });
   return \@match;
}

__DATA__
LCS: abcdefghijklm
All CSS: abcd e fghijklm
LCSS: fghijklm
LCSS: fghijklm
[download]

cheers

tachyon

s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

In reply to Re: LCCS time complexity by tachyon
in thread LCCS time complexity by rkg

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.