Yup, hashes are great for grouping. But since you have two data points for each date (count and size), you'll need of some kind of 2d structure. I used a hash of hash in the following:
use Text::CSV_XS qw( );
my %size_by_date;
my $csv = Text::CSV_XS->new();
while ( my $row = $csv->getline($fh) ) {
my ($name, $date, $size) = @$row;
++$size_by_date{$date}{count};
$size_by_date{$date}{size} += $size;
}
die("csv parse error: " . $csv->error_diag() . "\n")
if !$csv->eof();
for my $date (keys %size_by_date) {
my ($count, $size) = @{ $size_by_date{$date} }{qw( count size )};
print("$date: $count, $size\n");
}
| [reply] [d/l] |
Oh wow. A 2d structure seems so obvious now. I swear I bang my head against the wall sometimes cause it feels so good when I stop.
| [reply] |
For a large number of columns in a tab-delimited or a csv file, its quite easy to use existing modules to get the information you want. If files are nicely structured, regardless of the number of columns, you can use the Data::CTable module. When combined with the Statistics::Descriptive module, you can get much more information from your data... the let's say your data is like this...
name,date,size
name1,date1,120
name2,date2,140
name3,date3,150
well here's some code, hopefully easy enough to understand...
use strict;
use Data::CTable;
use Statistics::Descriptive;
my $data = Data::CTable->new("data.txt"); # your csv file
$data->clean_ws(); # clean up whitespace
my $sizecolumn = $data->col('size'); # get column by name
my $stat = Statistics::Descriptive::Full->new();
$stat->add_data($sizecolumn);
print "sum of the column size:", $stat->sum() , "\n";
Its up to you, you can use the Statistics::Descriptive module to get much more information (sum, mean, median standard deviation etc) from your data as well, or maybe if you just need a simple sum you can add the elements of the array yourself.
perliff ----------------------
-with perl on my side
| [reply] [d/l] [select] |
Another possible solution...
#!/usr/bin/perl
use strict;
use Text::CSV;
my %hash;
my $csv = Text::CSV->new();
while( my $line = <DATA>) {
if ( $csv->parse($line) ) {
my @columns = $csv->fields();
next if ( $columns[0] eq "Name" );
$hash{$columns[1]}->[0]++;
$hash{$columns[1]}->[1] = exists $hash{$columns[1]}->[1]
? $hash{$columns[1]}->[1] + $columns[2]
: $columns[2];
}
}
print "$_ -- $hash{$_}->[0] -- $hash{$_}->[1]\n" for ( keys %hash );
__DATA__
"Name","Date","size"
"Name One","05/19/2009","151397376"
"Name Two","05/19/2009","123333441"
"Name One","05/20/2009","183439993"
"Name Three","05/20/2009","8098123089"
| [reply] [d/l] |