comment on

The general idea is right, but you could simplify your average calculation.

use List::Util qw( sum );

my %data;
while (<>) {
    chomp;
    my ($k, $v) = split /\t/;
    push @{ $data{$k} }, $v;
}

local $, = "\t";
local $\ = "\n";
for my $k (keys %data) {
    my $data = $data{$k};
    print $k, 0+@$data, sum(@$data)/@$data;
}
[download]

Memory usage shouldn't be a problem with 300,000 lines, but you could reduce mem usage by summing and counting the elements as you go along.

my %data;
while (<>) {
    chomp;
    my ($k, $v) = split /\t/;
    $data{$k}[0]++
    $data{$k}[1]+= $v;
}

local $, = "\t";
local $\ = "\n";
for my $k (keys %data) {
    my $data = $data{$k};
    print $k, $data->[0], $data->[1]/$data->[0];
}
[download]

If the keys are sorted (or at least grouped) in the input, you could reduce memory usage to something constant.

my $last;
my $sum;
my $count;
local $, = "\t";
local $\ = "\n";
while (<>) {
    chomp;
    my ($k, $v) = split /\t/;
    if (defined($last) && $k ne $last) {
        print $last, $count, $sum/$count;
        ($last, $count, $sum) = ($k, 0, 0);
    }
    $count++
    $sum += $v;
}

if (defined($last)) {
    print $last, $sum/$count;
}
[download]

As a one-liner, how about

perl -lane'
    $d{$F[0]}[0]++
    $d{$F[0]}[1]+= $F[1];
}{
    $, = "\t";
    print $_, $d{$_}[0], $d{$_}[1]/$d{$_}[0] for keys %d;
'
[download]

It can be shortened, but any simpler will affect readability.

Update: I wasn't printing out the count. Fixed.

In reply to Re: Data munging by ikegami
in thread Data munging by umasuresh

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.