How do I find match patterns between two DNA sequences?

Bio_student has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: How do I find match patterns between two DNA sequences? by marto (Cardinal) on Oct 26, 2010 at 10:38 UTC
I suggest you take a look at Perl and Bioinformatics from the tutorials section of this site, as well as BioPerl tutorial.	[reply]
Re: How do I find match patterns between two DNA sequences? by umasuresh (Hermit) on Oct 26, 2010 at 13:02 UTC
A must have book for a Bioinformatics beginner is: Beginning Perl for Bioinformatics. This book teaches you to quickly understand Perl's power in analyzing DNA and Protein sequences.	[reply]
Re: How do I find match patterns between two DNA sequences? by jethro (Monsignor) on Oct 26, 2010 at 10:42 UTC
There is very good documentation about regexes in perlre, the regex you are looking for is really simple to construct, just remember that the first parentheses in a regex puts the matched contents into $1. After that you just need to quotemeta the string and you can use that inside a regex If you have further problems, just ask again, but post the code you have already written	[reply]
Re^2: How do I find match patterns between two DNA sequences? by Bio_student (Novice) on Oct 26, 2010 at 15:27 UTC
What I mean is : If I have two sequences contain string sequences example : Seq1= AAGGTTCCTTAAGGAA and seq2= AAGGTTCCGGGGGGGGGG then how could I find the string which is similar at least 5 string in both sequences (i.e : AAGGTTCC or others) using Perl? I hope it's not make you confuse...	[reply]
Re^3: How do I find match patterns between two DNA sequences? by BrowserUk (Patriarch) on Oct 26, 2010 at 15:56 UTC
`my $s1 = 'AAGGTTCCTTAAGGAA';; my $s2 = 'AAGGTTCCGGGGGGGGGG';; for my $start ( 0 .. length( $s1 ) - 5 ) { for $len ( reverse 5 .. length( $s1 ) - $start ) { my $n = substr $s1, $start, $len; my $p2 = 1+index $s2, $n; printf "s1:%d s2:%d '%s'\n", $start, $p2-1, $n if $p2; } };; s1:0 s2:0 'AAGGTTCC' s1:0 s2:0 'AAGGTTC' s1:0 s2:0 'AAGGTT' s1:0 s2:0 'AAGGT' s1:1 s2:1 'AGGTTCC' s1:1 s2:1 'AGGTTC' s1:1 s2:1 'AGGTT' s1:2 s2:2 'GGTTCC' s1:2 s2:2 'GGTTC' s1:3 s2:3 'GTTCC'` [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. RIP an inspiration; A true Folk's Guy	[reply] [d/l]
Re: How do I find match patterns between two DNA sequences? by johngg (Canon) on Oct 26, 2010 at 12:28 UTC
As Marshall has said, it is difficult to divine your intent. Perhaps you want to find the character offsets where you have matching letters in each string? `knoppix@Microknoppix:~$ perl -E ' $seq1 = q{AAAAACCCGGGTTTTTAAA}; $seq2 = q{AAAAACGGGTTTCCCAGAGA}; $same = $seq1 ^ $seq2; push @posn, pos( $same ) while $same =~ m{(?=\x00)}g; say for @posn;' 0 1 2 3 4 5 8 11 17 knoppix@Microknoppix:~$` [download] Please explain again what you are trying to achieve. Cheers, JohnGG	[reply] [d/l]
Re: How do I find match patterns between two DNA sequences? by shevek (Beadle) on Oct 26, 2010 at 12:20 UTC
Hello Bio Student, Please give an example of what you mean my match? Do you mean "exact match", do you mean longest common substring, longest common subsequence, et al? Also, how big is the text for which you are trying to find your pattern? Do you only want the first match, or all matches? A little more detail would help greatly. If you could give an example that shows what you believe the code should return for your inputs would help as well.	[reply]
Re^2: How do I find match patterns between two DNA sequences? by Bio_student (Novice) on Oct 26, 2010 at 15:01 UTC
just like in example sequences, match means there are at least similar 5 strings in both sequences, then output the strings which are match.	[reply]
Re: How do I find match patterns between two DNA sequences? by Marshall (Canon) on Oct 26, 2010 at 10:29 UTC
I have no idea what you want - what would be considered a "match"? When asking a question, please be as specific as possible and for example in this case: what would be intended result be? - I have no idea. It is also better if you have tried some code yourself. Perl is great at this codon stuff, but what you've presented here is like a UFO sighting from 7 years ago.	[reply]
Re: How do I find match patterns between two DNA sequences? by locked_user sundialsvc4 (Abbot) on Oct 26, 2010 at 18:19 UTC
Here’s what I would suggest: Define your problem: Before you can devise a computerized solution to a problem, you must define the problem exactly. Research “prior art”: No matter what you are doing, it has been done before. Review all the tutorials mentioned, books, whatever. Carefully review the CPAN library to determine what resources you can use from there. Study regular-expressions: “Regexes” are the basic Swiss Army Knife® for string-banging. You will undoubtedly use them heavily in this application.