Hi PerlMonks,
I am a beginner in perl. Perl is the first computer language I am learning. I am interested to get the longest as well as all the common substrings from a set of sequences. I have searched almost all the threads to get the script for this purpose. I have come across a script c.pl given below, which can produce the longest common substring between two sequences but not from all the sequences. I have also read that LCSS in CPAN can do this task and I have gone through the subroutine but could not make out how to use it for getting the desird result.
Moreover, I am more interested in using subroutine in script (because I understand it easily) rather than using modules or algorithms at the beginning of script because I donot know in which directory of perl (within C drive) I should put the modules or algorithms to get to work. I have also tried several scripts but in vain. One of the scripts that I have tried is try11.pl (given below) but the cmd asks for CSS subroutine. May I request perlmonks to go through the script c.pl and to suggest me for changes in the code to get the desired results?
Where shall I get the detailed simple text for beginners (avoiding technical terms of computer science) for using the modules and algorithms of CPAN in perl script (with examples)?
The c.pl goes like:
#!/usr/bin/perl ## LONGEST COMMON SUBSTRINGs (LCS): use warnings; use strict; sub lc_substr { my ($str1, $str2) = @_; my $l_length = 0; # length of longest common substring my $len1 = length $str1; my $len2 = length $str2; my @char1 = (undef, split(//, $str1)); # $str1 as array of chars, in +dexed from 1 my @char2 = (undef, split(//, $str2)); # $str2 as array of chars, in +dexed from 1 my @lc_suffix; # "longest common suffix" table my @substrings; # list of common substrings of length $l_length for my $n1 ( 1 .. $len1 ) { for my $n2 ( 1 .. $len2 ) { if ($char1[$n1] eq $char2[$n2]) { # We have found a matching character. Is this the first matchi +ng character, or a # continuation of previous matching characters? If the former, + then the length of # the previous matching portion is undefined; set to zero. $lc_suffix[$n1-1][$n2-1] ||= 0; # In either case, declare the match to be one character longer + than the match of # characters preceding this character. $lc_suffix[$n1][$n2] = $lc_suffix[$n1-1][$n2-1] + 1; # If the resulting substring is longer than our previously rec +orded max length ... if ($lc_suffix[$n1][$n2] > $l_length) { # ... we record its length as our new max length ... $l_length = $lc_suffix[$n1][$n2]; # ... and clear our result list of shorter substrings. @substrings = (); } # If this substring is equal to our longest ... if ($lc_suffix[$n1][$n2] == $l_length) { # ... add it to our list of solutions. push @substrings, substr($str1, ($n1-$l_length), $l_length); } } } } return @substrings; } my @result1=lc_substr qw(ABABC BABCA ABCBA); my $result1=join('',@result1); my $leng1=length($result1); print"\n The longest common substring :\n"; print "\n@result1: Length=$leng1 letters\n"; print"\n Other common substrings in order of decreasing lengths are:\n +"; my @result2="?";
I have got the following results:
The longest common substring : BABC: Length=4 letters Other common substrings in order of decreasing lengths are:??
The expected results should look like:
The longest common substring : ABC; Length=3 Other common substrings in order of decreasing lengths are: AB: Length=2 BC: Length=2 BA: Length=2
I have tried the script try11.pl given below. But the cmd asks for CSS subroutine which I could not find in cpan. Here goes the try11.pl
#!/usr/bin/perl ## LONGEST COMMON SUBSTRINGS (sorted) from a set of given sequences: use strict; use warnings; sub LCS { # Line 5 my ($ctx, $a, $b) = @_; my ($amin, $amax, $bmin, $bmax) = (0, $#$a, 0, $#$b); while ($amin <= $amax and $bmin <= $bmax and $a->[$amin] eq $b->[$ +bmin]) { $amin++; $bmin++; } while ($amin <= $amax and $bmin <= $bmax and $a->[$amax] eq $b->[$ +bmax]) { $amax--; $bmax--; } # Line 15 my $h = $ctx->line_map(@$b[$bmin..$bmax]); # line numbers are off +by $bmin return $amin + _core_loop($ctx, $a, $amin, $amax, $h) + ($#$a - $a +max) unless wantarray; my @lcs = _core_loop($ctx,$a,$amin,$amax,$h); if ($bmin > 0) { # Line 20 $_->[1] += $bmin for @lcs; # correct line numbers } map([$_ => $_], 0 .. ($amin-1)), @lcs, map([$_ => ++$bmax], ($amax+1) .. $#$a); } sub a { my $match = CSS(@_); # line 28 if ( ref $_[0] eq 'ARRAY' ) { @$match = map{$_->[0]}sort{$b->[1]<=>$a->[1]}map{[$_,scalar(@$_ +)]}@$match } else { # Line 32 @$match = map{$_->[0]}sort{$b->[1]<=>$a->[1]}map{[$_,length($_) +]}@$match } return $match; } ## Data Input & Results: # Line 37 print"\nThe longest common substrings in decreasing order of lengths:\ +n"; my $result1=a qw(ABABC BABCA ABCBA); my $leng1=$result1; # Line 40 print"\n$result1; Length=$leng1\n\n"; exit;
Results of cmd for try11.pl:
Microsoft Windows [Version 6.1.7600] Copyright (c) 2009 Microsoft Corporation. All rights reserved. C:\Users\x>cd desktop C:\Users\x\Desktop>try1.pl The longest common substrings in decreasing order of lengths: Undefined subroutine &main::CSS called at C:\Users\DR-SUPRIYO\Desktop\ +try1.pl line 28. C:\Users\x\Desktop>
In reply to How can I get the longest and all other common substrings from a set of strings? by supriyoch_2008
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |