Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi, Firstly i'd like to thank everyone that took the time to help me yesterday. I now have another slight problem....on the same lines. Anyway, i am trying to align two short sequences of DNA and mark on mismatches, which i have done. Yesterday i got help trying to interleave and wrap the output to print 60 chars per line.. my problem is now that i need to interleave the mismatch markings aswell, this part isn't working. the snippet of code that i have is below;
my $mismatches = get_mismatches ($genome1, $genome2); my @strings = ($genome1, $mismatches, $genome2); for (my $x = 0; $x < length ($strings [0]); $x +=60) { print join ("\n", map (substr ($strings[$_], $x, 60), 0..2 )). "\n\n +"; } sub get_mismatches { ($genome1, $genome2) = @_; my ($length) = length ($genome1); my $position; my $count = 0; print "\n\n"; for ($position = 0; $position < $length; ++$position) { if (substr($genome1, $position, 1) eq substr ($genome2, $position +, 1)) { print "-"; } else { print "*"; } } return ($genome1, $genome2); }
The desired output would look like this;
ACACTTGCATACTGATC --*****-****---** ACTACGACTGCATGACT TGACTGCACT *********- CATGATATGT
but so far it looks like this;
ATGCATGACTACT CTGAATGCTGACT CGTCTGACTATAA ATGACTTGAC CTATGACTGA CCGATGACTG
so for some reason it thinks that $mismatches is a string of DNA nOt the combination of * and - that were in the subroutine. If anyone can help i'd really appreciate it.

Replies are listed 'Best First'.
Re: more interweaving
by BrowserUk (Patriarch) on Nov 19, 2002 at 10:34 UTC

    Here's your code modified to do what you want. The changes are pretty simple, but you need to look carefully, work out what changed and understand why they are necessary if you are going to do anything more complex than this.

    #! perl -slw use strict; my $genome1 = 'ACACTTGCATACTGATCATGACTTGACACACTTGCATACTGATCATGACTTGACACACTTGCATACTGA +TCATGACTTGACACACTTGCATACTGATCATGACTTGAC'; my $genome2 = 'ACTACGACTGCATGACTCCGATGACTGACTACGACTGCATGACTCCGATGACTGACTACGACTGCATGA +CTCCGATGACTGACTACGACTGCATGACTCCGATGACTG'; my $mismatches = get_mismatches ($genome1, $genome2); my @strings = ($genome1, $mismatches, $genome2); for (my $x = 0; $x < length ($strings [0]); $x +=60) { print join ("\n", map (substr ($strings[$_], $x, 60), 0..2 )) . " +\n"; } exit; sub get_mismatches { ($genome1, $genome2) = @_; my $mismatches = ''; my $length= length ($genome1); my $position; my $count = 0; for ($position = 0; $position < $length; ++$position) { if (substr($genome1, $position, 1) eq substr ($genome2, $posit +ion, 1)) { $mismatches .= "-"; } else { $mismatches .= "*"; } } return $mismatches; } __END__ C:\test>214064 ACACTTGCATACTGATCATGACTTGACACACTTGCATACTGATCATGACTTGACACACTT --*****-****---****--******--*****-****---****--******--**** ACTACGACTGCATGACTCCGATGACTGACTACGACTGCATGACTCCGATGACTGACTACG GCATACTGATCATGACTTGACACACTTGCATACTGATCATGACTTGAC *-****---****--******--*****-****---****--****** ACTGCATGACTCCGATGACTGACTACGACTGCATGACTCCGATGACTG C:\test>

    Okay you lot, get your wings on the left, halos on the right. It's one size fits all, and "No!", you can't have a different color.
    Pick up your cloud down the end and "Yes" if you get allocated a grey one they are a bit damp under foot, but someone has to get them.
    Get used to the wings fast cos its an 8 hour day...unless the Govenor calls for a cyclone or hurricane, in which case 16 hour shifts are mandatory.
    Just be grateful that you arrived just as the tornado season finished. Them buggers are real work.

      Thanks BrowserUk i can see what you've done.... As I have just started to learn perl simple solutions like this don't always seem that obvious. ;-)
Re: more interweaving
by Bukowski (Deacon) on Nov 19, 2002 at 10:35 UTC
    Hi,

    Can I just ask why you are intent on reinventing the wheel here? Pairwise sequence alignment is not a trivial problem, and unless this is a 'homework' question I think you would be better off employing an already established alignment tool and then parsing the results.

    Proper pairwise sequence comparison needs careful consideration of gap penalties, a scoring system for matches (preferably a scoring matrix) and a dip into the world of dynamic programming.

    Have you considered using ALIGN, FASTA or BLAST and parsing the results? You may also be interested in BioPerl a set of perl modules for bioinformatics.

    I apologise for not answering the question in hand, I just don't want to see you banging your head against a wall with a solution to a reasonably complex problem which may be flawed.

    Bukowski - aka Dan (dcs@black.hole-in-the.net)
    "Coffee for the mind, Pizza for the body, Sushi for the soul" -Userfriendly