in reply to need help comparing DNA compliments
If you only needed to know if the complement string was (or was not) a true complement of the original, perhaps the simplest way would be to re-complement the complement and check for equality. The re-complementation can be acheived very easily using Perl's tr/// operator. eg.
my $dna = 'ACGTACGT'; my $cmp = 'TGCATGCA'; (my $recmp = $cmp) =~ tr[ACGT][TGCA]; print 'Matched' if $recmp eq $dna;
However, as you want to know what the failures were, you can still use the technique to simplify the process of finding them by creating the re-complemented string and then detecting the differences. You could do this with substr
#! perl -slw use strict; while( !eof DATA ) { my $dna = <DATA>; my $cmp = <DATA>; (my $temp = $cmp) =~ tr[ACGT][TGCA]; if ($dna eq $temp) { print "$cmp is an accurate compliment of \n$dna"; } elsif (length $dna != length $cmp) { print "$cmp is a different length to \n$dna"; } else { for my $pos(0 .. length $dna) { my $c1 = substr($dna, $pos, 1); my $c2 = substr($cmp, $pos, 1); print "$c1-$c2 (pos:$pos) is a mismatch" unless $c1 eq (substr($temp, $pos, 1)); } } } __DATA__ ACTGGTACATAGCTAGCTATAGCATACGATATAGACGTCTGCTAGTCGTCGTTTGCCTAAAGCCTAGATC +GTAGCTAGTC TGACCATGTATCGATCGATATCGTATGCTATATCTGCAGACGATCAGCAGCAAACGGATTTCGGATCTAG +CATCGATCAG ACTGGTACATAGCTAGCTATAGCATACGATATAGACGTCTGCTAGTCGTCGTTTGCCTAAAGCCTAGATC +GTAGCTAGTC TGACCATGTATCGATCGATATCGTATGCTATATCTGCAGACGATCAGCAGCAAACGGATTTCGGATCTAG +CATCGATCA ACTGGTACATAGCTAGCTATAGCATACGATATAGACGTCTGCTAGTCGTCGTTTGCCTAAAGCCTAGATC +GTAGCTAGTC TGACCATGTATCGATCGATATCCTAGGCTATATCTGCAGACGATCAGCAGCAAACGGATATCGGATCTAG +GATCGATCAG
Output
C:\test>243661 TGACCATGTATCGATCGATATCGTATGCTATATCTGCAGACGATCAGCAGCAAACGGATTTCGGATCTAG +CATCGATCAG is an accurate compliment of ACTGGTACATAGCTAGCTATAGCATACGATATAGACGTCTGCTAGTCGTCGTTTGCCTAAAGCCTAGATC +GTAGCTAGTC TGACCATGTATCGATCGATATCGTATGCTATATCTGCAGACGATCAGCAGCAAACGGATTTCGGATCTAG +CATCGATCA is a different length to ACTGGTACATAGCTAGCTATAGCATACGATATAGACGTCTGCTAGTCGTCGTTTGCCTAAAGCCTAGATC +GTAGCTAGTC C-C (pos:22) is a mismatch A-G (pos:25) is a mismatch A-A (pos:59) is a mismatch G-G (pos:70) is a mismatch C:\test>
Or you could go one stage further and use Perl's bit-wise string manipulations to find the differences and use the resultant string to indicate those differences in a simple manner.
#! perl -sw use strict; while( !eof DATA ) { my $dna = <DATA>; my $cmp = <DATA>; (my $temp = $cmp) =~ tr[ACGT][TGCA]; if ($dna eq $temp) { print "\nNo mismatches found\n"; print $dna; print $cmp; } else { ($temp ^= $dna) =~s[[^\0]][*]g; print "\nAsterists (*) indicate mismatches\n"; print $dna; print $temp, $/; print $cmp; } } __DATA__ ACTGGTACATAGCTAGCTATAGCATACGATATAGACGTCTGCTAGTCGTCGTTTGCCTAAAGCCTAGATC +GTAGCTAGTC TGACCATGTATCGATCGATATCGTATGCTATATCTGCAGACGATCAGCAGCAAACGGATTTCGGATCTAG +CATCGATCAG ACTGGTACATAGCTAGCTATAGCATACGATATAGACGTCTGCTAGTCGTCGTTTGCCTAAAGCCTAGATC +GTAGCTAGTC TGACCATGTATCGATCGATATCGTATGCTATATCTGCAGACGATCAGCAGCAAACGGATTTCGGATCTAG +CATCGATCA ACTGGTACATAGCTAGCTATAGCATACGATATAGACGTCTGCTAGTCGTCGTTTGCCTAAAGCCTAGATC +GTAGCTAGTC TGACCATGTATCGATCGATATCCTAGGCTATATCTGCAGACGATCAGCAGCAAACGGATATCGGATCTAG +GATCGATCAG
Output
C:\test>243661-2 No mismatches found ACTGGTACATAGCTAGCTATAGCATACGATATAGACGTCTGCTAGTCGTCGTTTGCCTAAAGCCTAGATC +GTAGCTAGTC TGACCATGTATCGATCGATATCGTATGCTATATCTGCAGACGATCAGCAGCAAACGGATTTCGGATCTAG +CATCGATCAG Asterists (*) indicate mismatches ACTGGTACATAGCTAGCTATAGCATACGATATAGACGTCTGCTAGTCGTCGTTTGCCTAAAGCCTAGATC +GTAGCTAGTC + ** TGACCATGTATCGATCGATATCGTATGCTATATCTGCAGACGATCAGCAGCAAACGGATTTCGGATCTAG +CATCGATCA Asterists (*) indicate mismatches ACTGGTACATAGCTAGCTATAGCATACGATATAGACGTCTGCTAGTCGTCGTTTGCCTAAAGCCTAGATC +GTAGCTAGTC * * * +* TGACCATGTATCGATCGATATCCTAGGCTATATCTGCAGACGATCAGCAGCAAACGGATATCGGATCTAG +GATCGATCAG C:\test>
|
|---|