need help comparing DNA compliments

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.

Re: need help comparing DNA compliments
by jasonk (Parson) on Mar 17, 2003 at 15:50 UTC

I don't know how the pairing works, so this sample code only includes the pairs you specifically mentioned in your post, if there are other possible pairs you would have to add them to %good_pairs.

my %good_pairs = (
    'A' => 'T',
    'T' => 'A',
    'G' => 'C',
    'C' => 'G',
);
for(0 .. $#dna) {
    if($dna[$_] ne $good_pairs{$comp[$_]}) {
        print "mismatch: $dna[$_] $comp[$_]\n";
    }
}
[download]

We're not surrounded, we're in a target-rich environment!

[reply]
[d/l]

Re: need help comparing DNA compliments
by BrowserUk (Patriarch) on Mar 17, 2003 at 17:35 UTC

If you only needed to know if the complement string was (or was not) a true complement of the original, perhaps the simplest way would be to re-complement the complement and check for equality. The re-complementation can be acheived very easily using Perl's tr/// operator. eg.

 
my $dna = 'ACGTACGT';
my $cmp = 'TGCATGCA';
(my $recmp = $cmp) =~ tr[ACGT][TGCA];
print 'Matched' if $recmp eq $dna;
[download]

However, as you want to know what the failures were, you can still use the technique to simplify the process of finding them by creating the re-complemented string and then detecting the differences. You could do this with substr

#! perl -slw
use strict;

while( !eof DATA ) {
    my $dna = <DATA>;
    my $cmp = <DATA>;
    (my $temp = $cmp) =~ tr[ACGT][TGCA];
    if ($dna eq $temp) {
        print "$cmp is an accurate compliment of \n$dna";
    } elsif (length $dna != length $cmp) {
        print "$cmp is a different length to \n$dna";
    } else {
        for my $pos(0 .. length $dna) {
            my $c1 = substr($dna, $pos, 1);
            my $c2 = substr($cmp, $pos, 1);
            print "$c1-$c2 (pos:$pos) is a mismatch"
                unless $c1 eq (substr($temp, $pos, 1));
        }
    }
}


__DATA__
ACTGGTACATAGCTAGCTATAGCATACGATATAGACGTCTGCTAGTCGTCGTTTGCCTAAAGCCTAGATC
+GTAGCTAGTC
TGACCATGTATCGATCGATATCGTATGCTATATCTGCAGACGATCAGCAGCAAACGGATTTCGGATCTAG
+CATCGATCAG
ACTGGTACATAGCTAGCTATAGCATACGATATAGACGTCTGCTAGTCGTCGTTTGCCTAAAGCCTAGATC
+GTAGCTAGTC
TGACCATGTATCGATCGATATCGTATGCTATATCTGCAGACGATCAGCAGCAAACGGATTTCGGATCTAG
+CATCGATCA
ACTGGTACATAGCTAGCTATAGCATACGATATAGACGTCTGCTAGTCGTCGTTTGCCTAAAGCCTAGATC
+GTAGCTAGTC
TGACCATGTATCGATCGATATCCTAGGCTATATCTGCAGACGATCAGCAGCAAACGGATATCGGATCTAG
+GATCGATCAG
[download]

Output

C:\test>243661
TGACCATGTATCGATCGATATCGTATGCTATATCTGCAGACGATCAGCAGCAAACGGATTTCGGATCTAG
+CATCGATCAG
 is an accurate compliment of
ACTGGTACATAGCTAGCTATAGCATACGATATAGACGTCTGCTAGTCGTCGTTTGCCTAAAGCCTAGATC
+GTAGCTAGTC

TGACCATGTATCGATCGATATCGTATGCTATATCTGCAGACGATCAGCAGCAAACGGATTTCGGATCTAG
+CATCGATCA
 is a different length to
ACTGGTACATAGCTAGCTATAGCATACGATATAGACGTCTGCTAGTCGTCGTTTGCCTAAAGCCTAGATC
+GTAGCTAGTC

C-C (pos:22) is a mismatch
A-G (pos:25) is a mismatch
A-A (pos:59) is a mismatch
G-G (pos:70) is a mismatch

C:\test>
[download]

Or you could go one stage further and use Perl's bit-wise string manipulations to find the differences and use the resultant string to indicate those differences in a simple manner.

#! perl -sw
use strict;

while( !eof DATA ) {
    my $dna = <DATA>;
    my $cmp = <DATA>;
    (my $temp = $cmp) =~ tr[ACGT][TGCA];

    if ($dna eq $temp) {
        print "\nNo mismatches found\n";
        print $dna;
        print $cmp;
    } else {
        ($temp ^= $dna) =~s[[^\0]][*]g;
        print "\nAsterists (*) indicate mismatches\n";
        print $dna;
        print $temp, $/;
        print $cmp;
    }
}


__DATA__
ACTGGTACATAGCTAGCTATAGCATACGATATAGACGTCTGCTAGTCGTCGTTTGCCTAAAGCCTAGATC
+GTAGCTAGTC
TGACCATGTATCGATCGATATCGTATGCTATATCTGCAGACGATCAGCAGCAAACGGATTTCGGATCTAG
+CATCGATCAG
ACTGGTACATAGCTAGCTATAGCATACGATATAGACGTCTGCTAGTCGTCGTTTGCCTAAAGCCTAGATC
+GTAGCTAGTC
TGACCATGTATCGATCGATATCGTATGCTATATCTGCAGACGATCAGCAGCAAACGGATTTCGGATCTAG
+CATCGATCA
ACTGGTACATAGCTAGCTATAGCATACGATATAGACGTCTGCTAGTCGTCGTTTGCCTAAAGCCTAGATC
+GTAGCTAGTC
TGACCATGTATCGATCGATATCCTAGGCTATATCTGCAGACGATCAGCAGCAAACGGATATCGGATCTAG
+GATCGATCAG
[download]

Output

C:\test>243661-2

No mismatches found
ACTGGTACATAGCTAGCTATAGCATACGATATAGACGTCTGCTAGTCGTCGTTTGCCTAAAGCCTAGATC
+GTAGCTAGTC
TGACCATGTATCGATCGATATCGTATGCTATATCTGCAGACGATCAGCAGCAAACGGATTTCGGATCTAG
+CATCGATCAG

Asterists (*) indicate mismatches
ACTGGTACATAGCTAGCTATAGCATACGATATAGACGTCTGCTAGTCGTCGTTTGCCTAAAGCCTAGATC
+GTAGCTAGTC
                                                                      
+         **
TGACCATGTATCGATCGATATCGTATGCTATATCTGCAGACGATCAGCAGCAAACGGATTTCGGATCTAG
+CATCGATCA

Asterists (*) indicate mismatches
ACTGGTACATAGCTAGCTATAGCATACGATATAGACGTCTGCTAGTCGTCGTTTGCCTAAAGCCTAGATC
+GTAGCTAGTC
                      *  *                                 *          
+*
TGACCATGTATCGATCGATATCCTAGGCTATATCTGCAGACGATCAGCAGCAAACGGATATCGGATCTAG
+GATCGATCAG

C:\test>
[download]

Examine what is said, not who speaks.

1) When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.
2) The only way of discovering the limits of the possible is to venture a little way past them into the impossible
3) Any sufficiently advanced technology is indistinguishable from magic.
Arthur C. Clarke.

[reply]
[d/l]
[select]

Re: need help comparing DNA compliments
by perlguy (Deacon) on Mar 17, 2003 at 15:53 UTC

my $string1 = 'ACTG';
my $string2 = 'TGAT';

my %corresponding = ('A' => 'T',
                     'T' => 'A',
                     'C' => 'G',
                     'G' => 'C');

my $count = 0;
for (split '', $string1) {
   my $char = substr($string2, $count++, 1);
   if ($char ne $corresponding{$_}) {
      print "Position $count incorrect:  1: $_, 2: $char\n";
   }
}
[download]

[reply]
[d/l]

Re: need help comparing DNA compliments
by benn (Vicar) on Mar 17, 2003 at 16:00 UTC

my @dna = split("","ATCTGCG"); 
my @comp = split("","TCTGCAA"); 
my %pairs = qw(A T T A G C C G); 
foreach (0..$#dna) {
    my $d = $dna[$_];
    my $c = $comp[$_];
    print "$_: $d - $c is a mispair\n" unless ($pairs{$d}  eq $c);
}
[download]

Update

[reply]
[d/l]

Re: need help comparing DNA compliments
by broquaint (Abbot) on Mar 17, 2003 at 16:01 UTC

my @pairs = ( [qw/ATCG TAGC/], [qw/ACTG TGAT/] );

print "@$_: ", (dna_compare(@$_) ? "matched" : "didn't matched"), $/
  for @pairs;

sub dna_compare {
  my($dna, $comp) = @_;

  return !scalar grep {
    (substr($dna, $_, 1).substr($comp, $_, 1))
     !~ m< ^ (?: AT | TA | GC | CG ) \z >x;
  } 0 .. (length($dna) - 1);
}

__output__

ATCG TAGC: matched
ACTG TGAT: didn't matched
[download]

_________ broquaint

[reply]
[d/l]

Re: need help comparing DNA compliments
by zby (Vicar) on Mar 17, 2003 at 16:01 UTC

%pairs

for my $i (0 .. $#dna){
    if(! exists $pairs{$dna[$i].$comp[$i]}){
        print "mismatch: ....";
    }
}
[download]

Disclaimer:

[reply]
[d/l]
[select]

Re: need help comparing DNA compliments
by dakkar (Hermit) on Mar 17, 2003 at 17:26 UTC

First, you are not comparing strings, you are comparing arrays.

As this is not C, you don't need a for loop to get the length of an array: just say:

$counter=@dna;$counter2=@comp;
[download]

context

And your cycles are a bit out of whack... you are saying:

if @dna is not empty (while (@dna))
count how many bases I have in @dna
count how many bases I have in @comp
check if the one past last element of each list is a correct pairing
etc

I'll not write here a "correct" solution, since many other people already have. I'll just write a (commented) snippet for the case of strings:

my $dna; # the sequence, as a string
my $comp; # the complement, as a string
my $c2;
($c2=$comp)=~tr[ATGC][TACG]; # copy $comp into $c2, then exchange each
+ base in $c2 with its complement (using the tr-ansliteration operator
+)
print "Match!\n" if $dna eq $c2; # since $c2 is $comp's complement, it
+ should be equal to $dna
[download]

-- 
        dakkar - Mobilis in mobile

[reply]
[d/l]
[select]

Re: need help comparing DNA compliments
by sabkat (Acolyte) on Mar 17, 2003 at 22:39 UTC

I code in perl on the MAGPIE project (MAGPIE) and would be glad to help you out!

[reply]

Re: need help comparing DNA compliments
by CompuKid101 (Novice) on Mar 17, 2003 at 21:10 UTC

I created a program on my calculator to do exactly this. Takes the DNA or mRNA or tRNA and outputs any of those.

Impress your teacher by making a Human Genome viewer with the data from UCSC

[reply]

Re: need help comparing DNA compliments
by jryan (Vicar) on Mar 19, 2003 at 07:28 UTC

Algorithm::Diff

[reply]


Welcome to the Monastery
	PerlMonks