The general idea is right, but you could simplify your average calculation.
use List::Util qw( sum ); my %data; while (<>) { chomp; my ($k, $v) = split /\t/; push @{ $data{$k} }, $v; } local $, = "\t"; local $\ = "\n"; for my $k (keys %data) { my $data = $data{$k}; print $k, 0+@$data, sum(@$data)/@$data; }
Memory usage shouldn't be a problem with 300,000 lines, but you could reduce mem usage by summing and counting the elements as you go along.
my %data; while (<>) { chomp; my ($k, $v) = split /\t/; $data{$k}[0]++ $data{$k}[1]+= $v; } local $, = "\t"; local $\ = "\n"; for my $k (keys %data) { my $data = $data{$k}; print $k, $data->[0], $data->[1]/$data->[0]; }
If the keys are sorted (or at least grouped) in the input, you could reduce memory usage to something constant.
my $last; my $sum; my $count; local $, = "\t"; local $\ = "\n"; while (<>) { chomp; my ($k, $v) = split /\t/; if (defined($last) && $k ne $last) { print $last, $count, $sum/$count; ($last, $count, $sum) = ($k, 0, 0); } $count++ $sum += $v; } if (defined($last)) { print $last, $sum/$count; }
As a one-liner, how about
perl -lane' $d{$F[0]}[0]++ $d{$F[0]}[1]+= $F[1]; }{ $, = "\t"; print $_, $d{$_}[0], $d{$_}[1]/$d{$_}[0] for keys %d; '
It can be shortened, but any simpler will affect readability.
Update: I wasn't printing out the count. Fixed.
In reply to Re: Data munging
by ikegami
in thread Data munging
by umasuresh
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |