Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??

Maybe Algotithm::Diff will perform better. It isnt C but always seems to perform its job well and fast. If you are trying to do this on gene sequences you probably want to do it in C - once you have found an efficient algoritm. It does LCS native and can be easily made to do LCSS. Yes LCS and LCSS are different. The lcss method returns all the common substrings, unsorted. Just iterate through the list and remeber the index of the longest (much more efficient than a sort). You would want a Schwartzian transform on the sort by length if you want to use the multiple results (but if you want more than one LCSS its kinda a contradiction in terms).

NB: Edge case where two (L)CSS are same length second ignored in my simple loop.

use Algorithm::Diff qw(LCS traverse_sequences); my @seq1 = split //, 'abcdefghijklmnopqrstuvwxyz'; my @seq2 = split //,'flubberabcdubberdofghijklm'; my $lcs = join '', LCS( \@seq1, \@seq2 ); print "LCS: $lcs\n"; my $lcss = lcss( \@seq1, \@seq2 ); print "All CSS: @$lcss\n"; my $index; my $length = 0; for( my $i = 0; $i < @$lcss; $i++ ) { next unless length($lcss->[$i])>$length; $index = $i; $length = length($lcss->[$i]); } print "LCSS: ", $lcss->[$index], "\n"; # optional sort method print "LCSS: ", (sort{length{$a}<=>length{$b}}@$lcss)[-1]; sub lcss { my ( $seq1, $seq2 ) = @_; my ( @match, $from_match ); my $i = 0; traverse_sequences( $seq1, $seq2, { MATCH => sub { $match[$i] .= $seq1->[$_[0]]; $from_match = 1 } +, DISCARD_A => sub { do{$i++; $from_match = 0} if $from_match }, DISCARD_B => sub { do{$i++; $from_match = 0} if $from_match }, }); return \@match; } __DATA__ LCS: abcdefghijklm All CSS: abcd e fghijklm LCSS: fghijklm LCSS: fghijklm




In reply to Re: LCCS time complexity by tachyon
in thread LCCS time complexity by rkg

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?

What's my password?
Create A New User
Domain Nodelet?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (6)
As of 2023-02-07 11:48 GMT
Find Nodes?
    Voting Booth?
    I prefer not to run the latest version of Perl because:

    Results (39 votes). Check out past polls.