comment on

Hi PerlMonks,

I am a beginner in perl. Perl is the first computer language I am learning. I am interested to get the longest as well as all the common substrings from a set of sequences. I have searched almost all the threads to get the script for this purpose. I have come across a script c.pl given below, which can produce the longest common substring between two sequences but not from all the sequences. I have also read that LCSS in CPAN can do this task and I have gone through the subroutine but could not make out how to use it for getting the desird result.

Moreover, I am more interested in using subroutine in script (because I understand it easily) rather than using modules or algorithms at the beginning of script because I donot know in which directory of perl (within C drive) I should put the modules or algorithms to get to work. I have also tried several scripts but in vain. One of the scripts that I have tried is try11.pl (given below) but the cmd asks for CSS subroutine. May I request perlmonks to go through the script c.pl and to suggest me for changes in the code to get the desired results?

Where shall I get the detailed simple text for beginners (avoiding technical terms of computer science) for using the modules and algorithms of CPAN in perl script (with examples)?

The c.pl goes like:

#!/usr/bin/perl
## LONGEST COMMON SUBSTRINGs (LCS):  
use warnings;
use strict;
sub lc_substr {
  my ($str1, $str2) = @_; 
  my $l_length = 0; # length of longest common substring
  my $len1 = length $str1; 
  my $len2 = length $str2; 
  my @char1 = (undef, split(//, $str1)); # $str1 as array of chars, in
+dexed from 1
  my @char2 = (undef, split(//, $str2)); # $str2 as array of chars, in
+dexed from 1
  my @lc_suffix; # "longest common suffix" table
  my @substrings; # list of common substrings of length $l_length 
  for my $n1 ( 1 .. $len1 ) { 
    for my $n2 ( 1 .. $len2 ) { 
      if ($char1[$n1] eq $char2[$n2]) {
        # We have found a matching character. Is this the first matchi
+ng character, or a
        # continuation of previous matching characters? If the former,
+ then the length of
        # the previous matching portion is undefined; set to zero.
        $lc_suffix[$n1-1][$n2-1] ||= 0;
        # In either case, declare the match to be one character longer
+ than the match of
        # characters preceding this character.
        $lc_suffix[$n1][$n2] = $lc_suffix[$n1-1][$n2-1] + 1;
        # If the resulting substring is longer than our previously rec
+orded max length ...
        if ($lc_suffix[$n1][$n2] > $l_length) {
          # ... we record its length as our new max length ...
          $l_length = $lc_suffix[$n1][$n2];
          # ... and clear our result list of shorter substrings.
          @substrings = ();
        }
        # If this substring is equal to our longest ...
        if ($lc_suffix[$n1][$n2] == $l_length) {
          # ... add it to our list of solutions.
          push @substrings, substr($str1, ($n1-$l_length), $l_length);
        }
      }
    }
  }   
 
  return @substrings;
}
my @result1=lc_substr qw(ABABC BABCA ABCBA);
my $result1=join('',@result1);
my $leng1=length($result1); 
print"\n The longest common substring :\n";
print "\n@result1: Length=$leng1 letters\n";
print"\n Other common substrings in order of decreasing lengths are:\n
+";
my @result2="?";
[download]

I have got the following results:

The longest common substring :
BABC: Length=4 letters
Other common substrings in order of decreasing lengths are:??
[download]

The expected results should look like:

The longest common substring :
ABC; Length=3
Other common substrings in order of decreasing lengths are:
AB: Length=2
BC: Length=2
BA: Length=2
[download]

I have tried the script try11.pl given below. But the cmd asks for CSS subroutine which I could not find in cpan. Here goes the try11.pl

#!/usr/bin/perl
## LONGEST COMMON SUBSTRINGS (sorted) from a set of given sequences:
use strict;
use warnings;
sub LCS {                             # Line 5 
    my ($ctx, $a, $b) = @_;
    my ($amin, $amax, $bmin, $bmax) = (0, $#$a, 0, $#$b);
    while ($amin <= $amax and $bmin <= $bmax and $a->[$amin] eq $b->[$
+bmin]) {
        $amin++;
        $bmin++;
    }
    while ($amin <= $amax and $bmin <= $bmax and $a->[$amax] eq $b->[$
+bmax]) {
        $amax--;
        $bmax--;
    }                                  # Line 15 
    my $h = $ctx->line_map(@$b[$bmin..$bmax]); # line numbers are off 
+by $bmin
    return $amin + _core_loop($ctx, $a, $amin, $amax, $h) + ($#$a - $a
+max)
        unless wantarray;
    my @lcs = _core_loop($ctx,$a,$amin,$amax,$h);
    if ($bmin > 0) {                   # Line 20 
        $_->[1] += $bmin for @lcs; # correct line numbers
    }
    map([$_ => $_], 0 .. ($amin-1)),
        @lcs,
            map([$_ => ++$bmax], ($amax+1) .. $#$a);
    }
sub a {                                    
    my $match = CSS(@_);            # line 28 
    if ( ref $_[0] eq 'ARRAY' ) {
       @$match = map{$_->[0]}sort{$b->[1]<=>$a->[1]}map{[$_,scalar(@$_
+)]}@$match
    }
    else {                          # Line 32 
       @$match = map{$_->[0]}sort{$b->[1]<=>$a->[1]}map{[$_,length($_)
+]}@$match
    }
  return $match;
}
## Data Input & Results:                   # Line 37 
print"\nThe longest common substrings in decreasing order of lengths:\
+n";
my $result1=a qw(ABABC BABCA ABCBA);
my $leng1=$result1;                        # Line 40 
print"\n$result1; Length=$leng1\n\n";
exit;
[download]

Results of cmd for try11.pl:

Microsoft Windows [Version 6.1.7600]
Copyright (c) 2009 Microsoft Corporation.  All rights reserved.
C:\Users\x>cd desktop
C:\Users\x\Desktop>try1.pl

The longest common substrings in decreasing order of lengths:
Undefined subroutine &main::CSS called at C:\Users\DR-SUPRIYO\Desktop\
+try1.pl line 28.
C:\Users\x\Desktop>
[download]

In reply to How can I get the longest and all other common substrings from a set of strings? by supriyoch_2008

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.