zyzzogeton has asked for the wisdom of the Perl Monks concerning the following question:

I have a simple csv of names, dates and sizes eg: "Name","Date","size" "Name One","05/19/2009","151397376" "Name Two","05/19/2009","123333441" "Name One","05/20/2009","183439993" "Name Three","05/20/2009","8098123089" I need to get a count for the dates and add the sizes, eg: 5/19/2009, 2, 123333441 5/20/2009, 2, 8281563082 I am still pretty new to perl but it looks like I should load everything in to a hash?

Replies are listed 'Best First'.
Re: Counting items in a CSV
by ikegami (Patriarch) on Jun 22, 2009 at 16:49 UTC
    Yup, hashes are great for grouping. But since you have two data points for each date (count and size), you'll need of some kind of 2d structure. I used a hash of hash in the following:
    use Text::CSV_XS qw( ); my %size_by_date; my $csv = Text::CSV_XS->new(); while ( my $row = $csv->getline($fh) ) { my ($name, $date, $size) = @$row; ++$size_by_date{$date}{count}; $size_by_date{$date}{size} += $size; } die("csv parse error: " . $csv->error_diag() . "\n") if !$csv->eof(); for my $date (keys %size_by_date) { my ($count, $size) = @{ $size_by_date{$date} }{qw( count size )}; print("$date: $count, $size\n"); }
      Oh wow. A 2d structure seems so obvious now. I swear I bang my head against the wall sometimes cause it feels so good when I stop.
Re: Counting items in a CSV
by perliff (Monk) on Jun 22, 2009 at 18:04 UTC
    For a large number of columns in a tab-delimited or a csv file, its quite easy to use existing modules to get the information you want. If files are nicely structured, regardless of the number of columns, you can use the Data::CTable module. When combined with the Statistics::Descriptive module, you can get much more information from your data... the let's say your data is like this...
    name,date,size name1,date1,120 name2,date2,140 name3,date3,150
    well here's some code, hopefully easy enough to understand...
    use strict; use Data::CTable; use Statistics::Descriptive; my $data = Data::CTable->new("data.txt"); # your csv file $data->clean_ws(); # clean up whitespace my $sizecolumn = $data->col('size'); # get column by name my $stat = Statistics::Descriptive::Full->new(); $stat->add_data($sizecolumn); print "sum of the column size:", $stat->sum() , "\n";
    Its up to you, you can use the Statistics::Descriptive module to get much more information (sum, mean, median standard deviation etc) from your data as well, or maybe if you just need a simple sum you can add the elements of the array yourself.

    perliff

    ----------------------

    -with perl on my side

Re: Counting items in a CSV
by bichonfrise74 (Vicar) on Jun 22, 2009 at 23:52 UTC
    Another possible solution...
    #!/usr/bin/perl use strict; use Text::CSV; my %hash; my $csv = Text::CSV->new(); while( my $line = <DATA>) { if ( $csv->parse($line) ) { my @columns = $csv->fields(); next if ( $columns[0] eq "Name" ); $hash{$columns[1]}->[0]++; $hash{$columns[1]}->[1] = exists $hash{$columns[1]}->[1] ? $hash{$columns[1]}->[1] + $columns[2] : $columns[2]; } } print "$_ -- $hash{$_}->[0] -- $hash{$_}->[1]\n" for ( keys %hash ); __DATA__ "Name","Date","size" "Name One","05/19/2009","151397376" "Name Two","05/19/2009","123333441" "Name One","05/20/2009","183439993" "Name Three","05/20/2009","8098123089"