Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
This problem happens to be of interest to me as well. I think the following code does what you're getting at. It's a little crude, but the best I can do at this point is:
#!/usr/bin/perl use strict; use warnings; my $seq="ASPTFHKLDTPRLAKLJHHDFSDA"; my @pattern=("ST","P","RK","ILVF"); # array of refs to arrays of redundant # residues within the pattern my @patternarray; for (my $e=0;$e<=$#pattern;$e++){ my @elementarray= split (/ */, $pattern[$e]); $patternarray[$e]=\@elementarray; } my $found; my $lastmatchpos; LOOP: until ($found){ # deal with the first residue match as a special case my @resarray=@{$patternarray[0]}; $seq=~ /([@resarray])/gc; die("Sequence does not contain requested motif.\n") unless $1; $found = $1; $lastmatchpos=pos($seq); # all the other residues in the pattern for (my $e=1;$e<=$#patternarray;$e++){ my @resarray=@{$patternarray[$e]}; if ($seq=~ /\G([@resarray])/gc){ $found .= $1; }else{ #reset matching algorithm my $newmatchpos=$lastmatchpos+1; last if ($newmatchpos > length($seq)); pos($newmatchpos); $found=''; next LOOP; } } } print ("$found at $lastmatchpos\n");

This only matches the first occurrance of a motif in a given sequence. It should be possible to extend this to return all the matches with a little work. For use of the m/\G.../gc idiom, see perldoc:perlop.

Hope this helps,
Tim

Update: Sorry, I just re-read the original request and one of the nuances escaped me. To get the script to just print out the most it can match after having matched the initial residue, I think you can just change the else clause to:

}else{ next LOOP; }

Update to the update: To deal with the case where the first x residues in the motif don't match the target sequence, I think you should be able to do something like wrapping the LOOP block in another for loop to iterate over motif residues while looking for an initial match.

Hmmm. I'm still not sure I've quite got what your're looking for. At what point do you call a match significant? I.e. do you want target sequences matching only 4 motif residues or more, for example? Or will just a single matching residue do (which I doubt, but which is of course the easiest case)?


In reply to Re: Progressive pattern matching by tfrayner
in thread Progressive pattern matching by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (7)
As of 2024-04-24 08:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found