Hi PerlMonks,

I am a beginner in perl. Perl is the first computer language I am learning. I am interested to get the longest as well as all the common substrings from a set of sequences. I have searched almost all the threads to get the script for this purpose. I have come across a script c.pl given below, which can produce the longest common substring between two sequences but not from all the sequences. I have also read that LCSS in CPAN can do this task and I have gone through the subroutine but could not make out how to use it for getting the desird result.

Moreover, I am more interested in using subroutine in script (because I understand it easily) rather than using modules or algorithms at the beginning of script because I donot know in which directory of perl (within C drive) I should put the modules or algorithms to get to work. I have also tried several scripts but in vain. One of the scripts that I have tried is try11.pl (given below) but the cmd asks for CSS subroutine. May I request perlmonks to go through the script c.pl and to suggest me for changes in the code to get the desired results?

Where shall I get the detailed simple text for beginners (avoiding technical terms of computer science) for using the modules and algorithms of CPAN in perl script (with examples)?

The c.pl goes like:

#!/usr/bin/perl ## LONGEST COMMON SUBSTRINGs (LCS): use warnings; use strict; sub lc_substr { my ($str1, $str2) = @_; my $l_length = 0; # length of longest common substring my $len1 = length $str1; my $len2 = length $str2; my @char1 = (undef, split(//, $str1)); # $str1 as array of chars, in +dexed from 1 my @char2 = (undef, split(//, $str2)); # $str2 as array of chars, in +dexed from 1 my @lc_suffix; # "longest common suffix" table my @substrings; # list of common substrings of length $l_length for my $n1 ( 1 .. $len1 ) { for my $n2 ( 1 .. $len2 ) { if ($char1[$n1] eq $char2[$n2]) { # We have found a matching character. Is this the first matchi +ng character, or a # continuation of previous matching characters? If the former, + then the length of # the previous matching portion is undefined; set to zero. $lc_suffix[$n1-1][$n2-1] ||= 0; # In either case, declare the match to be one character longer + than the match of # characters preceding this character. $lc_suffix[$n1][$n2] = $lc_suffix[$n1-1][$n2-1] + 1; # If the resulting substring is longer than our previously rec +orded max length ... if ($lc_suffix[$n1][$n2] > $l_length) { # ... we record its length as our new max length ... $l_length = $lc_suffix[$n1][$n2]; # ... and clear our result list of shorter substrings. @substrings = (); } # If this substring is equal to our longest ... if ($lc_suffix[$n1][$n2] == $l_length) { # ... add it to our list of solutions. push @substrings, substr($str1, ($n1-$l_length), $l_length); } } } } return @substrings; } my @result1=lc_substr qw(ABABC BABCA ABCBA); my $result1=join('',@result1); my $leng1=length($result1); print"\n The longest common substring :\n"; print "\n@result1: Length=$leng1 letters\n"; print"\n Other common substrings in order of decreasing lengths are:\n +"; my @result2="?";

I have got the following results:

The longest common substring : BABC: Length=4 letters Other common substrings in order of decreasing lengths are:??

The expected results should look like:

The longest common substring : ABC; Length=3 Other common substrings in order of decreasing lengths are: AB: Length=2 BC: Length=2 BA: Length=2

I have tried the script try11.pl given below. But the cmd asks for CSS subroutine which I could not find in cpan. Here goes the try11.pl

#!/usr/bin/perl ## LONGEST COMMON SUBSTRINGS (sorted) from a set of given sequences: use strict; use warnings; sub LCS { # Line 5 my ($ctx, $a, $b) = @_; my ($amin, $amax, $bmin, $bmax) = (0, $#$a, 0, $#$b); while ($amin <= $amax and $bmin <= $bmax and $a->[$amin] eq $b->[$ +bmin]) { $amin++; $bmin++; } while ($amin <= $amax and $bmin <= $bmax and $a->[$amax] eq $b->[$ +bmax]) { $amax--; $bmax--; } # Line 15 my $h = $ctx->line_map(@$b[$bmin..$bmax]); # line numbers are off +by $bmin return $amin + _core_loop($ctx, $a, $amin, $amax, $h) + ($#$a - $a +max) unless wantarray; my @lcs = _core_loop($ctx,$a,$amin,$amax,$h); if ($bmin > 0) { # Line 20 $_->[1] += $bmin for @lcs; # correct line numbers } map([$_ => $_], 0 .. ($amin-1)), @lcs, map([$_ => ++$bmax], ($amax+1) .. $#$a); } sub a { my $match = CSS(@_); # line 28 if ( ref $_[0] eq 'ARRAY' ) { @$match = map{$_->[0]}sort{$b->[1]<=>$a->[1]}map{[$_,scalar(@$_ +)]}@$match } else { # Line 32 @$match = map{$_->[0]}sort{$b->[1]<=>$a->[1]}map{[$_,length($_) +]}@$match } return $match; } ## Data Input & Results: # Line 37 print"\nThe longest common substrings in decreasing order of lengths:\ +n"; my $result1=a qw(ABABC BABCA ABCBA); my $leng1=$result1; # Line 40 print"\n$result1; Length=$leng1\n\n"; exit;

Results of cmd for try11.pl:

Microsoft Windows [Version 6.1.7600] Copyright (c) 2009 Microsoft Corporation. All rights reserved. C:\Users\x>cd desktop C:\Users\x\Desktop>try1.pl The longest common substrings in decreasing order of lengths: Undefined subroutine &main::CSS called at C:\Users\DR-SUPRIYO\Desktop\ +try1.pl line 28. C:\Users\x\Desktop>

In reply to How can I get the longest and all other common substrings from a set of strings? by supriyoch_2008

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.