Bio_student has asked for the wisdom of the Perl Monks concerning the following question:

Hello guys! Could you help me with perl script to find match pattern between this two sequences for ex: $seq1 : AAAAACCCGGGTTTTTAAA ; $seq2 : AAAAACGGGTTTCCCAGAGA the result should be the match pattern.
  • Comment on How do I find match patterns between two DNA sequences?

Replies are listed 'Best First'.
Re: How do I find match patterns between two DNA sequences?
by marto (Cardinal) on Oct 26, 2010 at 10:38 UTC
Re: How do I find match patterns between two DNA sequences?
by umasuresh (Hermit) on Oct 26, 2010 at 13:02 UTC
    A must have book for a Bioinformatics beginner is:
    Beginning Perl for Bioinformatics.
    This book teaches you to quickly understand Perl's power in analyzing DNA and Protein sequences.
Re: How do I find match patterns between two DNA sequences?
by jethro (Monsignor) on Oct 26, 2010 at 10:42 UTC

    There is very good documentation about regexes in perlre, the regex you are looking for is really simple to construct, just remember that the first parentheses in a regex puts the matched contents into $1.

    After that you just need to quotemeta the string and you can use that inside a regex

    If you have further problems, just ask again, but post the code you have already written

      What I mean is : If I have two sequences contain string sequences example : Seq1= AAGGTTCCTTAAGGAA and seq2= AAGGTTCCGGGGGGGGGG then how could I find the string which is similar at least 5 string in both sequences (i.e : AAGGTTCC or others) using Perl? I hope it's not make you confuse...

        my $s1 = 'AAGGTTCCTTAAGGAA';; my $s2 = 'AAGGTTCCGGGGGGGGGG';; for my $start ( 0 .. length( $s1 ) - 5 ) { for $len ( reverse 5 .. length( $s1 ) - $start ) { my $n = substr $s1, $start, $len; my $p2 = 1+index $s2, $n; printf "s1:%d s2:%d '%s'\n", $start, $p2-1, $n if $p2; } };; s1:0 s2:0 'AAGGTTCC' s1:0 s2:0 'AAGGTTC' s1:0 s2:0 'AAGGTT' s1:0 s2:0 'AAGGT' s1:1 s2:1 'AGGTTCC' s1:1 s2:1 'AGGTTC' s1:1 s2:1 'AGGTT' s1:2 s2:2 'GGTTCC' s1:2 s2:2 'GGTTC' s1:3 s2:3 'GTTCC'

        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: How do I find match patterns between two DNA sequences?
by johngg (Canon) on Oct 26, 2010 at 12:28 UTC

    As Marshall has said, it is difficult to divine your intent. Perhaps you want to find the character offsets where you have matching letters in each string?

    knoppix@Microknoppix:~$ perl -E ' $seq1 = q{AAAAACCCGGGTTTTTAAA}; $seq2 = q{AAAAACGGGTTTCCCAGAGA}; $same = $seq1 ^ $seq2; push @posn, pos( $same ) while $same =~ m{(?=\x00)}g; say for @posn;' 0 1 2 3 4 5 8 11 17 knoppix@Microknoppix:~$

    Please explain again what you are trying to achieve.

    Cheers,

    JohnGG

Re: How do I find match patterns between two DNA sequences?
by shevek (Beadle) on Oct 26, 2010 at 12:20 UTC
    Hello Bio Student, Please give an example of what you mean my match? Do you mean "exact match", do you mean longest common substring, longest common subsequence, et al?

    Also, how big is the text for which you are trying to find your pattern? Do you only want the first match, or all matches? A little more detail would help greatly.

    If you could give an example that shows what you believe the code should return for your inputs would help as well.

      just like in example sequences, match means there are at least similar 5 strings in both sequences, then output the strings which are match.
Re: How do I find match patterns between two DNA sequences?
by Marshall (Canon) on Oct 26, 2010 at 10:29 UTC
    I have no idea what you want - what would be considered a "match"? When asking a question, please be as specific as possible and for example in this case: what would be intended result be? - I have no idea. It is also better if you have tried some code yourself. Perl is great at this codon stuff, but what you've presented here is like a UFO sighting from 7 years ago.
Re: How do I find match patterns between two DNA sequences?
by locked_user sundialsvc4 (Abbot) on Oct 26, 2010 at 18:19 UTC

    Here’s what I would suggest:

    1. Define your problem:   Before you can devise a computerized solution to a problem, you must define the problem exactly.
    2. Research “prior art”:   No matter what you are doing, it has been done before.   Review all the tutorials mentioned, books, whatever.   Carefully review the CPAN library to determine what resources you can use from there.
    3. Study regular-expressions:   “Regexes” are the basic Swiss Army Knife® for string-banging.   You will undoubtedly use them heavily in this application.