comment on

Okay, I think understand now. I also think that the following code should do what you want. It takes a sequence ($seq) and a motif (@motif, which may be degenerate) and finds all the matches of greater than 3 residues.

Again, this may be a suboptimal solution and I can't shake the nagging feeling that there's a simpler way.

Of course, that describes all my perl experiance to date :-)

Apologies to all and sundry for once again descending into biological jargon. It's the only way I can get my head round this stuff...

#!/usr/bin/perl -w

use strict;
use warnings;

my $seq="APKLGIYSPRIGLYHFHKLDTPRLGAKLJHHDGFYSDA";
my @motif=("ST","P","RK","ILVF","G","ILVFM","Y");

# set up motif array of arrays

my @motifarray; 
for (my $e=0;$e<=$#motif;$e++){
    my @elementarray= split (/ */, $motif[$e]);
    $motifarray[$e]=\@elementarray;
}

my $mstartpos = 0;  # starting point within motif 
my $success = 0;

# cycle through starting motif residues ("ST","P" etc.)

MOTIFRES: while ($mstartpos+1 < $#motif){
    
    # find all matches for a given starting motif residue

    my $test=$seq;
    my $lastmatchpos=0;
    while ($lastmatchpos < length($seq)){
        my $found='';

        # deal with the first 3 residue matches as a special case

        my @r0=@{$motifarray[$mstartpos]};   
        my @r1=@{$motifarray[$mstartpos+1]};
        my @r2=@{$motifarray[$mstartpos+2]};   
        if ($test=~ /([@r0])(?=[@r1][@r2])/gc){
            $found = $1;
            $lastmatchpos=pos($test);
        }

        # next motif starting residue if no further matches found

        unless ($found){
            $mstartpos++;
            next MOTIFRES;
        }
        
        # get all the other residues in the motif
        
        for (my $e=$mstartpos+1;$e<=$#motifarray;$e++){
            my @rn=@{$motifarray[$e]};
            if ($test=~ /\G([@rn])/gc){
                $found .= $1;
            }
        }

        # print out what we've got so far
        $success++;
        print ("$found at $lastmatchpos\n");
     }

    # repeat, using the next motif residue as the new starting point
    
    $mstartpos++;
}

die ("No matches found.\n") unless ($success);
print ("Total number of matches (nested or otherwise): $success\n");
[download]

Have fun,
Tim

Update: Minor bugfix; also removed a couple of superfluous and misconceived lines to tidy it up a bit.

In reply to Re: Progressive pattern matching by tfrayner
in thread Progressive pattern matching by Anonymous Monk

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Clear questions and runnable code get the best and fastest answer
	PerlMonks