Okay, I think understand now. I also think that the following code should do what you want. It takes a sequence ($seq) and a motif (@motif, which may be degenerate) and finds all the matches of greater than 3 residues.
Again, this may be a suboptimal solution and I can't shake the nagging feeling that there's a simpler way.
Of course, that describes all my perl experiance to date :-)
Apologies to all and sundry for once again descending into biological jargon. It's the only way I can get my head round this stuff...
#!/usr/bin/perl -w
use strict;
use warnings;
my $seq="APKLGIYSPRIGLYHFHKLDTPRLGAKLJHHDGFYSDA";
my @motif=("ST","P","RK","ILVF","G","ILVFM","Y");
# set up motif array of arrays
my @motifarray;
for (my $e=0;$e<=$#motif;$e++){
my @elementarray= split (/ */, $motif[$e]);
$motifarray[$e]=\@elementarray;
}
my $mstartpos = 0; # starting point within motif
my $success = 0;
# cycle through starting motif residues ("ST","P" etc.)
MOTIFRES: while ($mstartpos+1 < $#motif){
# find all matches for a given starting motif residue
my $test=$seq;
my $lastmatchpos=0;
while ($lastmatchpos < length($seq)){
my $found='';
# deal with the first 3 residue matches as a special case
my @r0=@{$motifarray[$mstartpos]};
my @r1=@{$motifarray[$mstartpos+1]};
my @r2=@{$motifarray[$mstartpos+2]};
if ($test=~ /([@r0])(?=[@r1][@r2])/gc){
$found = $1;
$lastmatchpos=pos($test);
}
# next motif starting residue if no further matches found
unless ($found){
$mstartpos++;
next MOTIFRES;
}
# get all the other residues in the motif
for (my $e=$mstartpos+1;$e<=$#motifarray;$e++){
my @rn=@{$motifarray[$e]};
if ($test=~ /\G([@rn])/gc){
$found .= $1;
}
}
# print out what we've got so far
$success++;
print ("$found at $lastmatchpos\n");
}
# repeat, using the next motif residue as the new starting point
$mstartpos++;
}
die ("No matches found.\n") unless ($success);
print ("Total number of matches (nested or otherwise): $success\n");
Have fun,
Tim
Update: Minor bugfix; also removed a couple of superfluous and misconceived lines to tidy it up a bit.