in reply to Data munging
The general idea is right, but you could simplify your average calculation.
use List::Util qw( sum ); my %data; while (<>) { chomp; my ($k, $v) = split /\t/; push @{ $data{$k} }, $v; } local $, = "\t"; local $\ = "\n"; for my $k (keys %data) { my $data = $data{$k}; print $k, 0+@$data, sum(@$data)/@$data; }
Memory usage shouldn't be a problem with 300,000 lines, but you could reduce mem usage by summing and counting the elements as you go along.
my %data; while (<>) { chomp; my ($k, $v) = split /\t/; $data{$k}[0]++ $data{$k}[1]+= $v; } local $, = "\t"; local $\ = "\n"; for my $k (keys %data) { my $data = $data{$k}; print $k, $data->[0], $data->[1]/$data->[0]; }
If the keys are sorted (or at least grouped) in the input, you could reduce memory usage to something constant.
my $last; my $sum; my $count; local $, = "\t"; local $\ = "\n"; while (<>) { chomp; my ($k, $v) = split /\t/; if (defined($last) && $k ne $last) { print $last, $count, $sum/$count; ($last, $count, $sum) = ($k, 0, 0); } $count++ $sum += $v; } if (defined($last)) { print $last, $sum/$count; }
As a one-liner, how about
perl -lane' $d{$F[0]}[0]++ $d{$F[0]}[1]+= $F[1]; }{ $, = "\t"; print $_, $d{$_}[0], $d{$_}[1]/$d{$_}[0] for keys %d; '
It can be shortened, but any simpler will affect readability.
Update: I wasn't printing out the count. Fixed.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Data munging
by umasuresh (Hermit) on Jan 22, 2010 at 00:56 UTC | |
by ikegami (Patriarch) on Jan 22, 2010 at 01:01 UTC |