in reply to How to calculate sum of a column by window size based on another column
If the rows are not ordered by pos then you need to keep a total for each bin. Consider:
#!/usr/bin/perl use warnings; use strict; use 5.010; my $binSize = 20; my %binTotals; while (<DATA>){ chomp; my ($chr, $pos, $coverage) = split /\t/; $binTotals{int(($pos - 1) / $binSize)} += $coverage; } printf "%4d %d\n", 20 * (1 + $_), $binTotals{$_} for sort {$a <=> $b} +keys %binTotals; __DATA__ chr 1 2 chr 4 2 chr 7 5 chr 22 5 chr 24 6 chr 38 10 chr 44 10 chr 50 20 chr 57 25 chr 60 30 chr 65 30
Prints:
20 9 40 21 60 85 80 30
and actually that is a lot cleaner than code that assumes the rows ore ordered so doesn't need to accumulate totals for bins:
my $binSize = 20; my $bin = 0; my $total = 0; while ((my $line = <DATA> // '') || defined $bin){ chomp $line; my ($chr, $pos, $coverage) = split /\t/, $line; my $thisBin; $thisBin = int(($pos - 1) / $binSize) if defined $pos; if (! defined $thisBin || $bin != $thisBin) { printf "%4d %d\n", 20 * (1 + $bin), $total; last if ! defined $thisBin; $bin = $thisBin; $total = 0; } $total += $coverage; }
|
|---|