You could compress the sequences (substrings) into vectors like so:
The AGCT values will be interleaved in the resulting bitvector. Better yet, transform the full sequence.my $bitvec = pack "H*", ($string =~ tr/AGCT/1248/r);
Convert the bitvectors into integer vectors and sum them up. Better yet and if memory allows, create one big vector from the full sequence (with 4*length elements), as the first step.
Then, using an appropriate module with overloading, you might simply (completely untested code):
But then, there probably exist specific modules for histogramming...my @arrr = split //, unpack "B*", $bitvec; my $histogram = Math::GSL::Vector->new(); my $sequence = Math::GSL::Vector->new( \@arrr ); while (($pos,$freq) = each %points) { my $view = gsl_vector_subvector( $sequence, 4 * $pos, 4 * $slen ); $histogram += $view * $freq; $sum_freq += $freq; } # normalize and uninterleave $histogram *= 1 / $sum_freq; @parts = part { $i++ % 4 } $histogram->as_list;
In reply to Re: Weighted frequency of characters in an array of strings
by Anonymous Monk
in thread Weighted frequency of characters in an array of strings
by K_Edw
For: | Use: | ||
& | & | ||
< | < | ||
> | > | ||
[ | [ | ||
] | ] |