nicholaspr has asked for the wisdom of the Perl Monks concerning the following question:

Hi all I have a program that reads pairs of sequences and I want to perform a global sequence alignment and then from the alignment get the number of gaps and substitutions. How do I do this without having to include input and output files.. Thank you Nicholas

Replies are listed 'Best First'.
Re: Sequence alignment
by planetscape (Chancellor) on Jan 25, 2010 at 15:49 UTC
      Hi How do I download Align::NW
Re: Sequence alignment
by biohisham (Priest) on Jan 25, 2010 at 16:02 UTC
    If you can implement the Needeman-Wunsch dynamic algorithm you could get the gaps but not to reinvent the wheel, there is a CPAN module Algorithm::NeedlemanWunsch for that.

    As for the substitutions, I am not clear on how strict your criteria are and the type of sequences you have but maybe FindMod at ExPasy can be a choice...

    BLASTing would conduct local similarity searches which are not the OP's goal..


    Excellence is an Endeavor of Persistence. Chance Favors a Prepared Mind.
Re: Sequence alignment
by Corion (Patriarch) on Jan 25, 2010 at 15:00 UTC

    I don't know, but then, I'm not actually fluent in the terms of biology and computations relating to that. What did the Google search for alignment on this site tell you and how are the solutions/discussions there not helping?

    If they are not helping, maybe you can explain to us what your data structure is, and what an alignment is and how one alignment would be preferrable over another alignment.

    I guess you've looked at BioPerl already and found none of its methods sufficient for your task?

Re: Sequence alignment
by erix (Prior) on Jan 25, 2010 at 15:02 UTC

    I think this is best left to blast. After blast has run, you can read the resulting output with perl. BioPerl can help with (either or both) calling the blast binary, and/or reading its output.

    See the bioperl HOWTOs, esp. SearchIO

Re: Sequence alignment
by snape (Pilgrim) on Jan 25, 2010 at 15:34 UTC

    I guess u re talking about the Needleman Wunsch algorithm. I can help you in working out some steps but I think u need to define the gap penalty, match and mismatch score prior to get the alignment.

    1. Build an empty matrix where you can calculate the score and also have a pointer for the tracing back so that you can calculate the matches and subsequently get the gaps/mismatches.

    2. Filling the matrix and then tracing back the matrix

    my @matrix; $matrix[0][0]{score} = 0; $matrix[0][0]{pointer} = "none"; for(my $j = 1; $j <= length($seq1); $j++) { $matrix[0][$j]{score} = $GAP * $j; $matrix[0][$j]{pointer} = "left"; } for (my $i = 1; $i <= length($seq2); $i++) { $matrix[$i][0]{score} = $GAP * $i; $matrix[$i][0]{pointer} = "up"; } # fill for(my $i = 1; $i <= length($seq2); $i++) { for(my $j = 1; $j <= length($seq1); $j++) { my ($diagonal_score, $left_score, $up_score); # calculate match score my $letter1 = substr($seq1, $j-1, 1); my $letter2 = substr($seq2, $i-1, 1); + if ($letter1 eq $letter2) { $diagonal_score = $matrix[$i-1][$j-1]{score} + $MATCH; } else { $diagonal_score = $matrix[$i-1][$j-1]{score} + $MISMATCH; } # calculate gap scores $up_score = $matrix[$i-1][$j]{score} + $GAP; $left_score = $matrix[$i][$j-1]{score} + $GAP; # choose best score if ($diagonal_score >= $up_score) { if ($diagonal_score >= $left_score) { $matrix[$i][$j]{score} = $diagonal_score; $matrix[$i][$j]{pointer} = "diagonal"; } else { $matrix[$i][$j]{score} = $left_score; $matrix[$i][$j]{pointer} = "left"; } } else { if ($up_score >= $left_score) { $matrix[$i][$j]{score} = $up_score; $matrix[$i][$j]{pointer} = "up"; } else { $matrix[$i][$j]{score} = $left_score; $matrix[$i][$j]{pointer} = "left"; } } } } # trace-back my $align1 = ""; my $align2 = ""; # start at last cell of matrix my $j = length($seq1); my $i = length($seq2); while (1) { last if $matrix[$i][$j]{pointer} eq "none"; # ends at first cell o +f matrix if ($matrix[$i][$j]{pointer} eq "diagonal") { $align1 .= substr($seq1, $j-1, 1); $align2 .= substr($seq2, $i-1, 1); $i--; $j--; } elsif ($matrix[$i][$j]{pointer} eq "left") { $align1 .= substr($seq1, $j-1, 1); $align2 .= "-"; $j--; } elsif ($matrix[$i][$j]{pointer} eq "up") { $align1 .= "-"; $align2 .= substr($seq2, $i-1, 1); $i--; } } $align1 = reverse $align1; $align2 = reverse $align2; print "$align1\n"; print "$align2\n";

    For match = 1, mismatch = -1 and gap = -1.

    Input

    ATGTAGACCTAGATCATGATGACTGATGAT ATTACCGATGACTGATGACTGATGACTGAT

    Output

    ATGTAGACCTA-GA-TCATGA-TGACTGA-TGAT AT-T--ACCGATGACTGATGACTGA-TGACTGAT
      How we simply add the code for display max score? I try but do not print correctly.
      why this code doesn't work with me !!!

        Although marto is quite right, perhaps I can make an educated guess here?

        The code as given by snape is incomplete. When I add this before the code:

        #! perl use strict; use warnings; my $MATCH = 1; my $MISMATCH = -1; my $GAP = -1; chomp(my $seq1 = <DATA>); chomp(my $seq2 = <DATA>);

        and this after the code:

        __DATA__ ATGTAGACCTAGATCATGATGACTGATGAT ATTACCGATGACTGATGACTGATGACTGAT

        I get the same output as reported by snape.

        Disclaimer: I have no knowledge of the Needleman Wunsch algorithm (if that’s what this code is implementing), nor of what it’s for. :-)

        Hope that helps,

        Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,