onlyIDleft has asked for the wisdom of the Perl Monks concerning the following question:
Greetings Perl Monks!
I seek your wisdom for a problem of mine: I need to align 2 short DNA sequences. The starting material is 2 sequences, each 10 nucleotides long (the sequences should typically be As, Ts, Gs, or Cs. When sequence is ambiguous, Ns can also be expected).
There a few different possibilities in how the sequences may align to each other and I list them below:
a. 9 out of 10 in both align to each other perfectly
b. 10 out of 10 in both align to each other perfectly
c. 9 in one and 10 in other align to each other - with this imperfect alignment due to insertion/deletion
d. 9 out of 9 in both align to each other,but imperfectly due to substitution - but I will allow only one such substitution - for biological reasons
e. 10 out of 10 in both align to each other, but imperfectly due to substitution - but I will allow only one such substitution - again for biological reasons
In all of the cases above, none of the sequences can contain anything but A/T/G/C. If there are other letters such as Ns etc., those cases will need to be discarded without even performing the match test
As 1st pass attempt, I have cobbled up some script, but I know it does not test for all cases above. Would you please tell me if I should use a different approach to test all cases listed above, or can I adapt what I have already?
Thank you exalted ones!
if((defined $upstream_putative_TSD)&&(defined $downstream_p +utative_TSD)) { # Check if the putative TSDs differ by just 1 mismatch or +are perfect matches my $max_SNP = 1; my $diffCount = () = ( $upstream_putative_TSD ^ $downs +tream_putative_TSD ) =~ /[^\x00]/g; # print $upstream_putative_TSD, "\t", $ +downstream_putative_TSD, "\t", $diffCount, "\n"; # OK thus far # syntax idea from https://www.biostars.org/p/83978/ if ($diffCount <= $max_SNP) { my $upstream_putative_TSD_non_canonical_letter +_count = $upstream_putative_TSD =~ tr/BDEFHIJKLMNOPQRSUVWXYZ//; # check to see whether upstream putative TSD c +ontains anything but A/T/G/C, if yes, how many my $downstream_putative_TSD_non_canonical_lett +er_count = $downstream_putative_TSD =~ tr/BDEFHIJKLMNOPQRSUVWXYZ//; # check to see whether downstream putative TSD + contains anything but A/T/G/C, if yes, how many if(($upstream_putative_TSD_non_canonical_l +etter_count==0)&&($downstream_putative_TSD_non_canonical_letter_count +==0)) { print $_, "\n"; push @output, $_, "\n"; } } }
|
---|