Parsing Multiple Alignment -- using Perl for DNA

le_albatross has asked for the wisdom of the Perl Monks concerning the following question:

Hi all; I am fairly new to Perl. I'm trying to write a simple code that will parse 4 DNA sequences that have been aligned with gaps. I want the program to count up at nucleotide substitutions and ignore dashes '-' that denote gaps. I've come up with the following code, but the damn thing just runs forever, and doesn't seem to enter the if loops. Input file is fasta format. Any ideas?

 

@input = <INPUT>;
$query1_name = $input[0];
$query1 = $input[1];
$query2_name = $input[2];
$query2 = $input[3];
$query3_name = $input[4];
$query3 = $input[5];
$query4_name = $input[6];
$query4 = $input[7];

@query1 = split ('', $query1);
@query2 = split ('', $query2); 
@query3 = split ('', $query3);
@query4 = split ('', $query4);

$length1 = scalar @query1;
$length2 = scalar @query2;
$length3 = scalar @query3;
$length4 = scalar @query4;

while ($counter <= $length1)
{
    $query1nt = $query1[$counter];
     $query2nt = $query2[$counter];
     $query3nt = $query3[$counter];
     $query4nt = $query4[$counter];

    if ($query1nt ne $query2nt)
    {
        if ($query1nt ne '-' && $query2nt ne '-')
            {
            ++$count_sub;
             }
     }        
    elsif ($query1nt ne $query3nt)
    {
        if ($query1nt ne "-" && $query3nt ne "-")
            {
            ++$count_sub;
            }
    }
    elsif ($query1nt ne $query4nt)
    {
        if ($query1nt ne "-" && $query4nt ne "-")
            {
            ++$count_sub;
            }
    }        
    else
    {
        ++$count_same;
        print "else, hello\n";
        
    }    
}
$totallength = $count_sub + $count_same;

print "The number of subs is $count_sub. The number of matches is $cou
+nt_same.\n\n";

print "To check, $totallength should equal the length of the alignment
+.\n\n";

exit;
[download]

Comment on Parsing Multiple Alignment -- using Perl for DNA Download Code

Replies are listed 'Best First'.
Re: Parsing Multiple Alignment -- using Perl for DNA by Corion (Patriarch) on May 10, 2015 at 13:26 UTC
Where do you increment `$counter`?	[reply] [d/l]
Re^2: Parsing Multiple Alignment -- using Perl for DNA by le_albatross (Initiate) on May 15, 2015 at 15:20 UTC
FIXED thanks!	[reply]
Re: Parsing Multiple Alignment -- using Perl for DNA by Laurent_R (Canon) on May 11, 2015 at 18:45 UTC
Corion pointed to the probable most important mistake in your code. I would also add that `$counter` has also not been initialized (and not declared). Your code is an perfect example of what is often called "baby Perl", i.e. code using only a small subset of the language. There is nothing derogative in this expression, quite to the contrary: beginners are encouraged to write scripts in baby Perl, this is the right way to start learning. And most monks here (including myself), perhaps all monks, have started this way. I would like to offer a few possibly more "Perlish" ways of doing the same thing, but without trying to do too clever things: `@input = <INPUT>; $query1_name = $input[0]; $query1 = $input[1]; $query2_name = $input[2]; # ... $query4 = $input[7];` [download] You may want to use a data structure rather than 8 variables. The eight code lines above can be replaced by: `my @query; ($query[$_]{name}, $query[$_]{content}) = @input[2$_, 2$_ + 1] for 0 +..3;` [download] Which, for an @input array containing numbers fro 1 to 8, gives me the following data structure: `0 ARRAY(0x600500b60) 0 HASH(0x600500a88) 'content' => 2 'name' => 1 1 HASH(0x6005cffa8) 'content' => 4 'name' => 3 2 HASH(0x6005cffd8) 'content' => 6 'name' => 5 3 HASH(0x600635958) 'content' => 8 'name' => 7` [download] This structure is called an array of hashes (AoH): it is an array of four elements, in which each element is a reference to a hash with two keys. Similarly, you can change: `@query1 = split ('', $query1); @query2 = split ('', $query2); @query3 = split ('', $query3); @query4 = split ('', $query4);` [download] to this: `$query[$_]{split_content} = [ split '', $query[$_]{content} ] for 0..3 +;` [download] and for the lengths: `* $query[$_]{length} = scalar @{ $query[$_]{'split_content'}} for 0. +.3;` [download] although I am not sure this is needed since you seem to be using only `$length1`, but this may be another bug. Now the data structure is looking like this: `0 ARRAY(0x600500b60) 0 HASH(0x6005fdbb8) 'content' => 2 'length' => 1 'name' => 1 'split_content' => ARRAY(0x600635d30) 0 2 1 HASH(0x6005d0050) 'content' => 4 'length' => 1 'name' => 3 'split_content' => ARRAY(0x60063d850) 0 4 2 HASH(0x60063d928) 'content' => 6 'length' => 1 'name' => 5 'split_content' => ARRAY(0x6006360a8) 0 6 3 HASH(0x6006359d0) 'content' => 8 'length' => 1 'name' => 7 'split_content' => ARRAY(0x600636060) 0 8` [download] This: `while ($counter <= $length1) #...` [download] might be better written: `for my $counter (0..$length1) { ...` [download] which will do for you the incrementation missing in your code. I won't go further into your code, but I would add that you should should take the habit of declaring all your variables (with the `my` operator) and to insert the following pragmas at the top of your programs: `use strict; use warnings;` [download] Je suis Charlie.	[reply] [d/l] [select]
Re^2: Parsing Multiple Alignment -- using Perl for DNA by le_albatross (Initiate) on May 15, 2015 at 19:43 UTC
Thanks for the suggestions! I got the program to work by adding ++$counter; but I am going to go through your suggestions and implement them for my own knowledge.	[reply]