comment on

I think the code you posted doesn't make a lot of sense, given the problem that you described at the outset. If you're looking for the longest common substring in two strings that are each 3k characters long, and you're doing this across all distinct pairings of 300 such strings, you probably want to look at Algorithm::Diff and its "LCS" function.

I haven't used it much myself; looking at the man page, it seems like you might need to split each string into an array of characters -- something like this (not tested):

#!/usr/bin/perl

use strict;
use Algorithm::Diff qw/LCS/;

my @strings;

# fill @strings with your 300 elements of 3k characters each, then ...

for my $i ( 0 .. $#strings-1 ) {
    my @seq1 = split //, $strings[$i];
    for my $j ( $i+1 .. $#strings ) {
        my @seq2 = split //, $strings[$j];
        my @lcs = LCS( \@seq1, \@seq2 );
        print "LCS for $i :: $j is ", join( "",@lcs,"\n" );
    }
}
[download]

I'm just guessing about that. Do heed Grandfather's request, and show us some data (and maybe some other code you've tried that really is more relevant).

(updated code to fix a variable-name typo and the comment, and to declare @strings; also added spacing in the print statement, to avoid misuse of "::" as a namespace designation. Tested it on a sample text file (380 lines but less than 100 chars/line), and it seemed to do what's desired** -- not tremendously fast, but not impossibly slow.)

(** second update: ** then again, looking at the output, it's not clear that my script is actually making correct use of the module -- I don't seem to be getting a single contiguous "LCS" string from each comparison as I was expecting, but rather a bunch of fragments that are each one or more characters long. Better read further in the man page...)

In reply to Re: Search for identical substrings by graff
in thread Search for identical substrings by bioMan

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Perl Monk, Perl Meditation
	PerlMonks