in reply to To count letters (%identity) in DNA alignment

My first thought was to use a hash-of-hashes structure. If your data is as sparse as your sample input looks, this may save a little memory; otherwise, GrandFather's hash-of-arrays is more straight-forward:
use strict; use warnings; my %counts; my $max = 0; while (<DATA>) { chomp; my $code = (split)[-1]; my $i = 0; for my $c (split //, $code) { $counts{$c}{$i++}++; } $max = $i if $i > $max; } $max--; for my $c (keys %counts) { print "$c "; for my $i (0 .. $max) { if (exists $counts{$c}{$i}) { print " $counts{$c}{$i}"; } else { print ' 0'; } } print "\n"; } __DATA__ fred ATGTTGTAT fred1 ATCTTATAT fred2 ATCTTATAT

prints:

A 3 0 0 0 0 2 0 3 0 T 0 3 0 3 3 0 3 0 3 C 0 0 2 0 0 0 0 0 0 G 0 0 1 0 0 1 0 0 0

Replies are listed 'Best First'.
Re^2: To count letters (%identity) in DNA alignment
by GrandFather (Saint) on Jan 27, 2009 at 02:13 UTC

    "10K+ sequences" - it's probably not sparse. ;)


    Perl's payment curve coincides with its learning curve.