Re: How do I find match patterns between two DNA sequences?
by marto (Cardinal) on Oct 26, 2010 at 10:38 UTC
|
| [reply] |
Re: How do I find match patterns between two DNA sequences?
by umasuresh (Hermit) on Oct 26, 2010 at 13:02 UTC
|
A must have book for a Bioinformatics beginner is:
Beginning Perl for Bioinformatics. This book teaches you to quickly understand Perl's power in analyzing DNA and Protein sequences. | [reply] |
Re: How do I find match patterns between two DNA sequences?
by jethro (Monsignor) on Oct 26, 2010 at 10:42 UTC
|
There is very good documentation about regexes in perlre, the regex you are looking for is really simple to construct, just remember that the first parentheses in a regex puts the matched contents into $1.
After that you just need to quotemeta the string and you can use that inside a regex
If you have further problems, just ask again, but post the code you have already written
| [reply] |
|
|
What I mean is :
If I have two sequences contain string sequences example :
Seq1= AAGGTTCCTTAAGGAA and seq2= AAGGTTCCGGGGGGGGGG
then how could I find the string which is similar at least 5 string in both sequences (i.e : AAGGTTCC or others) using Perl?
I hope it's not make you confuse...
| [reply] |
|
|
my $s1 = 'AAGGTTCCTTAAGGAA';;
my $s2 = 'AAGGTTCCGGGGGGGGGG';;
for my $start ( 0 .. length( $s1 ) - 5 ) {
for $len ( reverse 5 .. length( $s1 ) - $start ) {
my $n = substr $s1, $start, $len;
my $p2 = 1+index $s2, $n;
printf "s1:%d s2:%d '%s'\n", $start, $p2-1, $n if $p2;
}
};;
s1:0 s2:0 'AAGGTTCC'
s1:0 s2:0 'AAGGTTC'
s1:0 s2:0 'AAGGTT'
s1:0 s2:0 'AAGGT'
s1:1 s2:1 'AGGTTCC'
s1:1 s2:1 'AGGTTC'
s1:1 s2:1 'AGGTT'
s1:2 s2:2 'GGTTCC'
s1:2 s2:2 'GGTTC'
s1:3 s2:3 'GTTCC'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] |
Re: How do I find match patterns between two DNA sequences?
by johngg (Canon) on Oct 26, 2010 at 12:28 UTC
|
As Marshall has said, it is difficult to divine your intent. Perhaps you want to find the character offsets where you have matching letters in each string?
knoppix@Microknoppix:~$ perl -E '
$seq1 = q{AAAAACCCGGGTTTTTAAA};
$seq2 = q{AAAAACGGGTTTCCCAGAGA};
$same = $seq1 ^ $seq2;
push @posn, pos( $same ) while $same =~ m{(?=\x00)}g;
say for @posn;'
0
1
2
3
4
5
8
11
17
knoppix@Microknoppix:~$
Please explain again what you are trying to achieve.
| [reply] [d/l] |
Re: How do I find match patterns between two DNA sequences?
by shevek (Beadle) on Oct 26, 2010 at 12:20 UTC
|
Hello Bio Student,
Please give an example of what you mean my match? Do you mean "exact match", do you mean longest common substring, longest common subsequence, et al? Also, how big is the text for which you are trying to find your pattern? Do you only want the first match, or all matches? A little more detail would help greatly. If you could give an example that shows what you believe the code should return for your inputs would help as well. | [reply] |
|
|
just like in example sequences, match means there are at least similar 5 strings in both sequences, then output the strings which are match.
| [reply] |
Re: How do I find match patterns between two DNA sequences?
by Marshall (Canon) on Oct 26, 2010 at 10:29 UTC
|
I have no idea what you want - what would be considered a "match"? When asking a question, please be as specific as possible and for example in this case: what would be intended result be? - I have no idea. It is also better if you have tried some code yourself. Perl is great at this codon stuff, but what you've presented here is like a UFO sighting from 7 years ago. | [reply] |
Re: How do I find match patterns between two DNA sequences?
by locked_user sundialsvc4 (Abbot) on Oct 26, 2010 at 18:19 UTC
|
| |