in reply to Re^3: Alignment and matrix generation
in thread Alignment and matrix generation

Dear Choroba

Thank you for your time. I modified the code as per your suggestions but it seems that the code is skipping the first block of alignment and rest the occurrences of nucleotides as percentages are wrong.

I am attaching the code and the data file for the same

#!/usr/bin/perl use strict; use warnings; use Syntax::Construct qw{ // }; my $endpos = 0; my ($startpos, $count); my %occurrences; my $file = $ARGV[0]; open(DATA, $file); while (<DATA>) { if (/^CLUSTAL.*/) {next;} if (/^ +$/) { $startpos = $endpos + 1; $count = 0; } elsif (/\s+ ([-actg]+) \s*$/x) { ++$count; my @nucleotides = split //, $1; $endpos = $endpos + length $1 if $startpos == $endpos + 1; for my $pos (0 .. $#nucleotides) { ++$occurrences{ $nucleotides[$pos] }[$startpos + $pos] unless '-' eq $nucleotides[$pos]; } } } for my $pos (1 .. $endpos) { print "$pos\t"; for my $nucleotide (sort keys %occurrences) { printf "%s\t%0.1f\t", uc $nucleotide, 100 * ($occurrences{$nuc +leotide}[$pos] // 0) / $count; } print "\n"; }

Data file

CLUSTAL O(1.2.1) multiple sequence alignment gnl|hbvcds|AB014370_PreC_P-A ----------------------------------- +------------------------- gnl|hbvcds|AB064314_PreC_P-A ----------------------------------- +------------------------- gnl|hbvcds|AB014384_C_P-C ----------------------------------- +------------------------- gnl|hbvcds|AB014385_C_P-C ----------------------------------- +------------------------- gnl|hbvcds|AB048701_PreS1_P-D atggggcagaatctttccaccagcaatcctctggg +attctttcccgaccatcagttggat gnl|hbvcds|AB078031_PreS1_P-D atggggcagaatctttccaccagcaaccctctggg +attctttcccgaccaccagttggat gnl|hbvcds|AB030513_S_P-A ----------------------------------- +------------------------a gnl|hbvcds|AB064314_S_P-A ----------------------------------- +------------------------c gnl|hbvcds|AB194947_PreS2_P-E ----------------------------------- +------------------------g gnl|hbvcds|AB194948_PreS2_P-E ----------------------------------- +------------------------g + gnl|hbvcds|AB014370_PreC_P-A tagagtctcctgagcattgctcacctcaccatact +gcactcaggcaagccattctctgct gnl|hbvcds|AB064314_PreC_P-A tagagtctcctgagcattgctcacctcaccatacg +gcactcaggcaagccattctctgct gnl|hbvcds|AB014384_C_P-C tagagtctccggaacattgttcacctcaccataca +gcactcaggcaagctattctgtgtt gnl|hbvcds|AB014385_C_P-C tagagtctccggaacattgttcacctcaccataca +gcactcaggcaagctattctgtgtt gnl|hbvcds|AB048701_PreS1_P-D gggtttttcttgttgacaagaatcctcacaatacc +gcagagtctagactcgtggtggact gnl|hbvcds|AB078031_PreS1_P-D gggtttttcttgttgacaagaatcctcacaatacc +gcagagtctagactcgtggtggact gnl|hbvcds|AB030513_S_P-A gggtttttcttgttgacaagaatcctcacaatacc +gcagagtctagactcgtggtggact gnl|hbvcds|AB064314_S_P-A gggtttttcttgttgacaagaatcctcacaatacc +gcagagtctagactcgtggtggact gnl|hbvcds|AB194947_PreS2_P-E gggtttttcttgttgacaaaaatcctcacaatacc +gcagagtctagactcgtggtggact gnl|hbvcds|AB194948_PreS2_P-E gggtttttcttgttgacaaaaatcctcacaatacc +gcagagtctagactcgtggtggact * * ** * * ****** **** +*** * * * *

Regards

Replies are listed 'Best First'.
Re^5: Alignment and matrix generation
by choroba (Cardinal) on Mar 31, 2015 at 15:57 UTC
    That's because your "actual data" from the previous reply was different to the data in the last one. To accommodate, just change the regex to match the empty lines. The exact form is left as an exercise to the reader.
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

      Dear Choroba,

      Thank you for your suggestions.

      The only thing I changed in the code is the line:

      - if (/^ +$/) { + if (/^\s+$/) {

      It seems its working now, let me know if you find something else.

      Regards