I think the code you posted doesn't make a lot of sense, given the problem that you described at the outset. If you're looking for the longest common substring in two strings that are each 3k characters long, and you're doing this across all distinct pairings of 300 such strings, you probably want to look at Algorithm::Diff and its "LCS" function.

I haven't used it much myself; looking at the man page, it seems like you might need to split each string into an array of characters -- something like this (not tested):

#!/usr/bin/perl use strict; use Algorithm::Diff qw/LCS/; my @strings; # fill @strings with your 300 elements of 3k characters each, then ... for my $i ( 0 .. $#strings-1 ) { my @seq1 = split //, $strings[$i]; for my $j ( $i+1 .. $#strings ) { my @seq2 = split //, $strings[$j]; my @lcs = LCS( \@seq1, \@seq2 ); print "LCS for $i :: $j is ", join( "",@lcs,"\n" ); } }
I'm just guessing about that. Do heed Grandfather's request, and show us some data (and maybe some other code you've tried that really is more relevant).

(updated code to fix a variable-name typo and the comment, and to declare @strings; also added spacing in the print statement, to avoid misuse of "::" as a namespace designation. Tested it on a sample text file (380 lines but less than 100 chars/line), and it seemed to do what's desired** -- not tremendously fast, but not impossibly slow.)

(** second update: ** then again, looking at the output, it's not clear that my script is actually making correct use of the module -- I don't seem to be getting a single contiguous "LCS" string from each comparison as I was expecting, but rather a bunch of fragments that are each one or more characters long. Better read further in the man page...)


In reply to Re: Search for identical substrings by graff
in thread Search for identical substrings by bioMan

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.