This problem happens to be of interest to me as well. I think the following code does what you're getting at. It's a little crude, but the best I can do at this point is:
#!/usr/bin/perl
use strict;
use warnings;
my $seq="ASPTFHKLDTPRLAKLJHHDFSDA";
my @pattern=("ST","P","RK","ILVF");
# array of refs to arrays of redundant
# residues within the pattern
my @patternarray;
for (my $e=0;$e<=$#pattern;$e++){
my @elementarray= split (/ */, $pattern[$e]);
$patternarray[$e]=\@elementarray;
}
my $found;
my $lastmatchpos;
LOOP: until ($found){
# deal with the first residue match as a special case
my @resarray=@{$patternarray[0]};
$seq=~ /([@resarray])/gc;
die("Sequence does not contain requested motif.\n") unless $1;
$found = $1;
$lastmatchpos=pos($seq);
# all the other residues in the pattern
for (my $e=1;$e<=$#patternarray;$e++){
my @resarray=@{$patternarray[$e]};
if ($seq=~ /\G([@resarray])/gc){
$found .= $1;
}else{ #reset matching algorithm
my $newmatchpos=$lastmatchpos+1;
last if ($newmatchpos > length($seq));
pos($newmatchpos);
$found='';
next LOOP;
}
}
}
print ("$found at $lastmatchpos\n");
This only matches the first occurrance of a motif in a given sequence. It should be possible to extend this to return all the matches with a little work. For use of the m/\G.../gc idiom, see perldoc:perlop.
Hope this helps,
Tim
Update: Sorry, I just re-read the original request and one of the nuances escaped me. To get the script to just print out the most it can match after having matched the initial residue, I think you can just change the else clause to:
}else{
next LOOP;
}
Update to the update: To deal with the case where the first x residues in the motif don't match the target sequence, I think you should be able to do something like wrapping the LOOP block in another for loop to iterate over motif residues while looking for an initial match.
Hmmm. I'm still not sure I've quite got what your're looking for. At what point do you call a match significant? I.e. do you want target sequences matching only 4 motif residues or more, for example? Or will just a single matching residue do (which I doubt, but which is of course the easiest case)?
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.