Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

need help comparing DNA compliments

by Anonymous Monk
on Mar 17, 2003 at 15:33 UTC ( [id://243661]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks, I wondered if someone would be able to help me. I simply want to compare two strings of dna (complements of each other) and determine whether they are correctly matched. e.g given two strings
ATCG.. TAGC..
Compare the first position in each, second position etc...However, if a base in the string does'nt correspond to its correct complement in the second string i want to print out that base pair.(correct pairings are A-T and G-C) e.g.
ACTG.. TGAT.. G-T is a mispair
I thought about using substr but don't know how to use substr to compare two strings. Here is a snippet of what i've tried. Much thanks ;-)
my @dna; # contains the sequence my @comp; # contains the complement while (@dna) { foreach my $base (@dna) { ++$counter; } foreach my $base2 (@comp) { ++$counter2; } if (($base[$counter] == 'A') && ($base2[$counter2] !~ 'T') { print "mismatch : $base[$counter] \t $base2[$counter2]\n +"; } # etc }
but obviuously using this approach i will have to type out the same line for each base in the sequence.

update (broquaint): title change (was newbie seeks help comparing strings)

Replies are listed 'Best First'.
Re: need help comparing DNA compliments
by jasonk (Parson) on Mar 17, 2003 at 15:50 UTC

    I don't know how the pairing works, so this sample code only includes the pairs you specifically mentioned in your post, if there are other possible pairs you would have to add them to %good_pairs.

    my %good_pairs = ( 'A' => 'T', 'T' => 'A', 'G' => 'C', 'C' => 'G', ); for(0 .. $#dna) { if($dna[$_] ne $good_pairs{$comp[$_]}) { print "mismatch: $dna[$_] $comp[$_]\n"; } }

    We're not surrounded, we're in a target-rich environment!
Re: need help comparing DNA compliments
by BrowserUk (Patriarch) on Mar 17, 2003 at 17:35 UTC

    If you only needed to know if the complement string was (or was not) a true complement of the original, perhaps the simplest way would be to re-complement the complement and check for equality. The re-complementation can be acheived very easily using Perl's tr/// operator. eg.

    my $dna = 'ACGTACGT'; my $cmp = 'TGCATGCA'; (my $recmp = $cmp) =~ tr[ACGT][TGCA]; print 'Matched' if $recmp eq $dna;

    However, as you want to know what the failures were, you can still use the technique to simplify the process of finding them by creating the re-complemented string and then detecting the differences. You could do this with substr

    #! perl -slw use strict; while( !eof DATA ) { my $dna = <DATA>; my $cmp = <DATA>; (my $temp = $cmp) =~ tr[ACGT][TGCA]; if ($dna eq $temp) { print "$cmp is an accurate compliment of \n$dna"; } elsif (length $dna != length $cmp) { print "$cmp is a different length to \n$dna"; } else { for my $pos(0 .. length $dna) { my $c1 = substr($dna, $pos, 1); my $c2 = substr($cmp, $pos, 1); print "$c1-$c2 (pos:$pos) is a mismatch" unless $c1 eq (substr($temp, $pos, 1)); } } } __DATA__ ACTGGTACATAGCTAGCTATAGCATACGATATAGACGTCTGCTAGTCGTCGTTTGCCTAAAGCCTAGATC +GTAGCTAGTC TGACCATGTATCGATCGATATCGTATGCTATATCTGCAGACGATCAGCAGCAAACGGATTTCGGATCTAG +CATCGATCAG ACTGGTACATAGCTAGCTATAGCATACGATATAGACGTCTGCTAGTCGTCGTTTGCCTAAAGCCTAGATC +GTAGCTAGTC TGACCATGTATCGATCGATATCGTATGCTATATCTGCAGACGATCAGCAGCAAACGGATTTCGGATCTAG +CATCGATCA ACTGGTACATAGCTAGCTATAGCATACGATATAGACGTCTGCTAGTCGTCGTTTGCCTAAAGCCTAGATC +GTAGCTAGTC TGACCATGTATCGATCGATATCCTAGGCTATATCTGCAGACGATCAGCAGCAAACGGATATCGGATCTAG +GATCGATCAG

    Output

    C:\test>243661 TGACCATGTATCGATCGATATCGTATGCTATATCTGCAGACGATCAGCAGCAAACGGATTTCGGATCTAG +CATCGATCAG is an accurate compliment of ACTGGTACATAGCTAGCTATAGCATACGATATAGACGTCTGCTAGTCGTCGTTTGCCTAAAGCCTAGATC +GTAGCTAGTC TGACCATGTATCGATCGATATCGTATGCTATATCTGCAGACGATCAGCAGCAAACGGATTTCGGATCTAG +CATCGATCA is a different length to ACTGGTACATAGCTAGCTATAGCATACGATATAGACGTCTGCTAGTCGTCGTTTGCCTAAAGCCTAGATC +GTAGCTAGTC C-C (pos:22) is a mismatch A-G (pos:25) is a mismatch A-A (pos:59) is a mismatch G-G (pos:70) is a mismatch C:\test>

    Or you could go one stage further and use Perl's bit-wise string manipulations to find the differences and use the resultant string to indicate those differences in a simple manner.

    #! perl -sw use strict; while( !eof DATA ) { my $dna = <DATA>; my $cmp = <DATA>; (my $temp = $cmp) =~ tr[ACGT][TGCA]; if ($dna eq $temp) { print "\nNo mismatches found\n"; print $dna; print $cmp; } else { ($temp ^= $dna) =~s[[^\0]][*]g; print "\nAsterists (*) indicate mismatches\n"; print $dna; print $temp, $/; print $cmp; } } __DATA__ ACTGGTACATAGCTAGCTATAGCATACGATATAGACGTCTGCTAGTCGTCGTTTGCCTAAAGCCTAGATC +GTAGCTAGTC TGACCATGTATCGATCGATATCGTATGCTATATCTGCAGACGATCAGCAGCAAACGGATTTCGGATCTAG +CATCGATCAG ACTGGTACATAGCTAGCTATAGCATACGATATAGACGTCTGCTAGTCGTCGTTTGCCTAAAGCCTAGATC +GTAGCTAGTC TGACCATGTATCGATCGATATCGTATGCTATATCTGCAGACGATCAGCAGCAAACGGATTTCGGATCTAG +CATCGATCA ACTGGTACATAGCTAGCTATAGCATACGATATAGACGTCTGCTAGTCGTCGTTTGCCTAAAGCCTAGATC +GTAGCTAGTC TGACCATGTATCGATCGATATCCTAGGCTATATCTGCAGACGATCAGCAGCAAACGGATATCGGATCTAG +GATCGATCAG

    Output

    C:\test>243661-2 No mismatches found ACTGGTACATAGCTAGCTATAGCATACGATATAGACGTCTGCTAGTCGTCGTTTGCCTAAAGCCTAGATC +GTAGCTAGTC TGACCATGTATCGATCGATATCGTATGCTATATCTGCAGACGATCAGCAGCAAACGGATTTCGGATCTAG +CATCGATCAG Asterists (*) indicate mismatches ACTGGTACATAGCTAGCTATAGCATACGATATAGACGTCTGCTAGTCGTCGTTTGCCTAAAGCCTAGATC +GTAGCTAGTC + ** TGACCATGTATCGATCGATATCGTATGCTATATCTGCAGACGATCAGCAGCAAACGGATTTCGGATCTAG +CATCGATCA Asterists (*) indicate mismatches ACTGGTACATAGCTAGCTATAGCATACGATATAGACGTCTGCTAGTCGTCGTTTGCCTAAAGCCTAGATC +GTAGCTAGTC * * * +* TGACCATGTATCGATCGATATCCTAGGCTATATCTGCAGACGATCAGCAGCAAACGGATATCGGATCTAG +GATCGATCAG C:\test>

    Examine what is said, not who speaks.
    1) When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.
    2) The only way of discovering the limits of the possible is to venture a little way past them into the impossible
    3) Any sufficiently advanced technology is indistinguishable from magic.
    Arthur C. Clarke.
Re: need help comparing DNA compliments
by perlguy (Deacon) on Mar 17, 2003 at 15:53 UTC
    here's my go at it:
    my $string1 = 'ACTG'; my $string2 = 'TGAT'; my %corresponding = ('A' => 'T', 'T' => 'A', 'C' => 'G', 'G' => 'C'); my $count = 0; for (split '', $string1) { my $char = substr($string2, $count++, 1); if ($char ne $corresponding{$_}) { print "Position $count incorrect: 1: $_, 2: $char\n"; } }
Re: need help comparing DNA compliments
by benn (Vicar) on Mar 17, 2003 at 16:00 UTC
    Looks suspiciouly like a homework assignment to me, but as you've actually posted code, here goes...:)

    The first two lines stuff the arrays with the individual letters - it's not obvious from your code whether this is initially the case or not.

    The '%pairs' hash demonstrates one way of avoiding large numbers of 'if' statements.
    my @dna = split("","ATCTGCG"); my @comp = split("","TCTGCAA"); my %pairs = qw(A T T A G C C G); foreach (0..$#dna) { my $d = $dna[$_]; my $c = $comp[$_]; print "$_: $d - $c is a mispair\n" unless ($pairs{$d} eq $c); }
    Hope this helps.

    Update ROFLMAO! Marvellous how so many people can come up with the same routine written so many ways in 5 minutes :) Go monks! :)
Re: need help comparing DNA compliments
by broquaint (Abbot) on Mar 17, 2003 at 16:01 UTC
    If you're just comparing 2 strings then this should do the trick
    my @pairs = ( [qw/ATCG TAGC/], [qw/ACTG TGAT/] ); print "@$_: ", (dna_compare(@$_) ? "matched" : "didn't matched"), $/ for @pairs; sub dna_compare { my($dna, $comp) = @_; return !scalar grep { (substr($dna, $_, 1).substr($comp, $_, 1)) !~ m< ^ (?: AT | TA | GC | CG ) \z >x; } 0 .. (length($dna) - 1); } __output__ ATCG TAGC: matched ACTG TGAT: didn't matched

    HTH

    _________
    broquaint

Re: need help comparing DNA compliments
by zby (Vicar) on Mar 17, 2003 at 16:01 UTC
    How about having a hash of proper pairs let it have the name %pairs? And than comparing a concatenation of the letters from the first and second string like this
    for my $i (0 .. $#dna){ if(! exists $pairs{$dna[$i].$comp[$i]}){ print "mismatch: ...."; } }
    Disclaimer: I don't anderstand your code - but from your description I think it is what you wanted.
Re: need help comparing DNA compliments
by dakkar (Hermit) on Mar 17, 2003 at 17:26 UTC

    First, you are not comparing strings, you are comparing arrays.

    As this is not C, you don't need a for loop to get the length of an array: just say:

    $counter=@dna;$counter2=@comp;
    Assigning an array to a scalar assigns the length of the array to the scalar (OK, we're talking about context here, but I'm not going into a detailed discussion)

    And your cycles are a bit out of whack... you are saying:

    1. if @dna is not empty (while (@dna))
    2. count how many bases I have in @dna
    3. count how many bases I have in @comp
    4. check if the one past last element of each list is a correct pairing
    5. etc

    I'll not write here a "correct" solution, since many other people already have. I'll just write a (commented) snippet for the case of strings:

    my $dna; # the sequence, as a string my $comp; # the complement, as a string my $c2; ($c2=$comp)=~tr[ATGC][TACG]; # copy $comp into $c2, then exchange each + base in $c2 with its complement (using the tr-ansliteration operator +) print "Match!\n" if $dna eq $c2; # since $c2 is $comp's complement, it + should be equal to $dna

    -- 
            dakkar - Mobilis in mobile
    
Re: need help comparing DNA compliments
by sabkat (Acolyte) on Mar 17, 2003 at 22:39 UTC
    Any reason you dont just BLAST the sequence together?

    I code in perl on the MAGPIE project (MAGPIE) and would be glad to help you out!

Re: need help comparing DNA compliments
by CompuKid101 (Novice) on Mar 17, 2003 at 21:10 UTC
    It seems this time of the year we are all getting into genetics/DNA. We just finished transforming E. coli with pGLO plasmid so the bacteria produce GFP and glow under UV light. We are now just starting a DNA fingerprinting project.

    I created a program on my calculator to do exactly this. Takes the DNA or mRNA or tRNA and outputs any of those.

    Impress your teacher by making a Human Genome viewer with the data from UCSC

Re: need help comparing DNA compliments
by jryan (Vicar) on Mar 19, 2003 at 07:28 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://243661]
Approved by diotalevi
Front-paged by diotalevi
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (4)
As of 2024-04-23 22:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found