Re: To count letters (%identity) in DNA alignment

My first thought was to use a hash-of-hashes structure. If your data is as sparse as your sample input looks, this may save a little memory; otherwise, GrandFather's hash-of-arrays is more straight-forward:

use strict;
use warnings;

my %counts;
my $max = 0;
while (<DATA>) {
    chomp;
    my $code = (split)[-1];
    my $i = 0;
    for my $c (split //, $code) {
        $counts{$c}{$i++}++;
    }
    $max = $i if $i > $max;
}
$max--;

for my $c (keys %counts) {
    print "$c ";
    for my $i (0 .. $max) {
        if (exists $counts{$c}{$i}) {
            print " $counts{$c}{$i}";
        }
        else {
            print ' 0';
        }
    }
    print "\n";    
}

__DATA__
fred  ATGTTGTAT
fred1 ATCTTATAT
fred2 ATCTTATAT
[download]

prints:

A  3 0 0 0 0 2 0 3 0
T  0 3 0 3 3 0 3 0 3
C  0 0 2 0 0 0 0 0 0
G  0 0 1 0 0 1 0 0 0
[download]

Comment on Re: To count letters (%identity) in DNA alignment Select or Download Code

Replies are listed 'Best First'.
Re^2: To count letters (%identity) in DNA alignment by GrandFather (Saint) on Jan 27, 2009 at 02:13 UTC
"10K+ sequences" - it's probably not sparse. ;) Perl's payment curve coincides with its learning curve.	[reply]