Okay, I think understand now. I also think that the following code should do what you want. It takes a sequence ($seq) and a motif (@motif, which may be degenerate) and finds all the matches of greater than 3 residues.
Again, this may be a suboptimal solution and I can't shake the nagging feeling that there's a simpler way.
Of course, that describes all my perl experiance to date :-)
Apologies to all and sundry for once again descending into biological jargon. It's the only way I can get my head round this stuff...
#!/usr/bin/perl -w
use strict;
use warnings;
my $seq="APKLGIYSPRIGLYHFHKLDTPRLGAKLJHHDGFYSDA";
my @motif=("ST","P","RK","ILVF","G","ILVFM","Y");
# set up motif array of arrays
my @motifarray;
for (my $e=0;$e<=$#motif;$e++){
my @elementarray= split (/ */, $motif[$e]);
$motifarray[$e]=\@elementarray;
}
my $mstartpos = 0; # starting point within motif
my $success = 0;
# cycle through starting motif residues ("ST","P" etc.)
MOTIFRES: while ($mstartpos+1 < $#motif){
# find all matches for a given starting motif residue
my $test=$seq;
my $lastmatchpos=0;
while ($lastmatchpos < length($seq)){
my $found='';
# deal with the first 3 residue matches as a special case
my @r0=@{$motifarray[$mstartpos]};
my @r1=@{$motifarray[$mstartpos+1]};
my @r2=@{$motifarray[$mstartpos+2]};
if ($test=~ /([@r0])(?=[@r1][@r2])/gc){
$found = $1;
$lastmatchpos=pos($test);
}
# next motif starting residue if no further matches found
unless ($found){
$mstartpos++;
next MOTIFRES;
}
# get all the other residues in the motif
for (my $e=$mstartpos+1;$e<=$#motifarray;$e++){
my @rn=@{$motifarray[$e]};
if ($test=~ /\G([@rn])/gc){
$found .= $1;
}
}
# print out what we've got so far
$success++;
print ("$found at $lastmatchpos\n");
}
# repeat, using the next motif residue as the new starting point
$mstartpos++;
}
die ("No matches found.\n") unless ($success);
print ("Total number of matches (nested or otherwise): $success\n");
Have fun,
Tim
Update: Minor bugfix; also removed a couple of superfluous and misconceived lines to tidy it up a bit.
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.