The general idea is right, but you could simplify your average calculation.

use List::Util qw( sum ); my %data; while (<>) { chomp; my ($k, $v) = split /\t/; push @{ $data{$k} }, $v; } local $, = "\t"; local $\ = "\n"; for my $k (keys %data) { my $data = $data{$k}; print $k, 0+@$data, sum(@$data)/@$data; }

Memory usage shouldn't be a problem with 300,000 lines, but you could reduce mem usage by summing and counting the elements as you go along.

my %data; while (<>) { chomp; my ($k, $v) = split /\t/; $data{$k}[0]++ $data{$k}[1]+= $v; } local $, = "\t"; local $\ = "\n"; for my $k (keys %data) { my $data = $data{$k}; print $k, $data->[0], $data->[1]/$data->[0]; }

If the keys are sorted (or at least grouped) in the input, you could reduce memory usage to something constant.

my $last; my $sum; my $count; local $, = "\t"; local $\ = "\n"; while (<>) { chomp; my ($k, $v) = split /\t/; if (defined($last) && $k ne $last) { print $last, $count, $sum/$count; ($last, $count, $sum) = ($k, 0, 0); } $count++ $sum += $v; } if (defined($last)) { print $last, $sum/$count; }

As a one-liner, how about

perl -lane' $d{$F[0]}[0]++ $d{$F[0]}[1]+= $F[1]; }{ $, = "\t"; print $_, $d{$_}[0], $d{$_}[1]/$d{$_}[0] for keys %d; '

It can be shortened, but any simpler will affect readability.

Update: I wasn't printing out the count. Fixed.


In reply to Re: Data munging by ikegami
in thread Data munging by umasuresh

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.