country1 has asked for the wisdom of the Perl Monks concerning the following question:


I am trying to calculate column averages (excluding
0's) for all the columns in a CSV file except the first
column. My input CSV file is as follows:

pickcpua.dat IMAGINGNY,1.45,0.42,1.54,1.49,1.47,1.36,1.81,0.47,1.8,0.55,0.38 JBSQLTST,1.29,1.09,1.13,1.88,1.11,1.44,1.25,1.23,1.05,1.39,1.61 SNYCSQL,4.58,4.24,3.87,3.9,4.13,2.04,3.34,7.6,3.58,1.26,7.45 Snynetsrv,26,26.34,24.59,26.46,26.24,26.14,32.35,31.77,31.77,29.92,26. +59 W32SPLMCOR01,8.27,13.23,7.73,8.85,9.15,13.95,0,0,0,0,0 W32SDAPSCT01,3.07,3.14,2.97,3.28,21.65,54.23,3.16,3.02,3.26,2.77,3.40 W32SDASALM01,1.22,1.3,1.11,0.92,1.57,1.06,1.01,0.87,0.93,2.26,0.91 W32SMSCSD02,15.38,12,22.32,23.3,19.74,46.42,2.06,1.7,2.17,2.85,1.74 W32SPLMCOR02,13.24,13.23,7.73,8.85,9.15,13.95,0,0,0,0,0
I have been able to calculate a column average for a particular column (excluding 0's) using the following perl code:
#!/usr/bin/perl use strict; use warnings; my $file = $ARGV[0] || 'pickcpua.dat'; my $col = $ARGV[1]; open(my $CPUFILE, "<", $file) || die "Unable to open sorted file !"; my @values; while (<$CPUFILE>) { chomp; my @col = split /,/; # Assume basic comma delimited if ($col[$ARGV[1]] > 0) { push @values, $col[$ARGV[1]]; # Column 2 is what we want } } my $total = 0; $total += $_ for @values; my $average = $total /@values; print "Average CPU Time for all servers in cpuatest.dat file is $avera +ge";


When trying to have perl calculate the column average
(excluding 0's)for all columns excluding the first
column, using the following perl code, perl is
calculating the 2nd column's average 9 times rather
than computing the column average for the 2nd through
10th column. The code is as follows:

#!/usr/bin/perl use strict; use warnings; my $file = $ARGV[0] || 'pickcpua.dat'; open(my $CPUFILE, "<", $file) || die "Unable to open sorted file !"; my @values; my @colval; for($index=1; $index<10; $index++) { while (<$CPUFILE>) { chomp; my @col = split /,/; # Assume basic comma delimited if ($col[$index] > 0) { push @values, $col[$index]; # Column 2 is what we want push @colval, $col[$index]; # Column 2 is what we want } } ## End while loop my $total = 0; $total += $_ for @colval; print "Pass $index through data \n"; print "colval is @colval \n"; print "values is @values \n"; my $average = $total /@colval; print "Average CPU Time for Column for all servers in cpuatest.dat fil +e is $average \n"; } ## End of initial $index for loop close $CPUFILE or die;


I would appreciate it if someone could tell me what I
am doing wrong in going from the calculation of one
column's average to calculating the column average for
all columns other than the first column. I do not
think I am using the for loop properly in this case

Replies are listed 'Best First'.
Re: Calculating Column Averages From A CSV File
by toolic (Bishop) on Aug 10, 2007 at 15:27 UTC
    I refactored the code quite a bit:
    #!/usr/bin/env perl use strict; use warnings; my $file = $ARGV[0] || 'pickcpua.dat'; open my $CPUFILE, "<", $file or die "Unable to open sorted file $!"; my @col_tot; my @col_num; my @col_avg; while (<$CPUFILE>) { chomp; my @f = split /,/; shift @f; # remove the 1st column my $num_cols = scalar @f; for my $i (0 .. $num_cols-1) { my $val = $f[$i]; if ($val>0) { $col_num[$i]++; $col_tot[$i] += $val; } } } for my $i (0 .. $#col_num) { $col_avg[$i] = $col_tot[$i]/$col_num[$i]; print "Column ", ($i + 2), " Average = $col_avg[$i]\n"; } close $CPUFILE or die; my $total = 0; $total += $_ for @col_avg; print "Average CPU Time for Column for all servers in cpuatest.dat fil +e is "; print $total/@col_avg, "\n";
    Update: Oops. After checking math (for a 2nd time!), this is what I get:
    Column 2 Average = 8.27777777777778 Column 3 Average = 8.33222222222222 Column 4 Average = 8.11 Column 5 Average = 8.77 Column 6 Average = 10.4677777777778 Column 7 Average = 17.8433333333333 Column 8 Average = 6.42571428571429 Column 9 Average = 6.66571428571429 Column 10 Average = 6.36571428571429 Column 11 Average = 5.85714285714286 Column 12 Average = 6.01142857142857 Average CPU Time for Column for all servers in cpuatest.dat file is 8. +46607503607504
    Is this what you are looking for?

      toolic,


      Thanks for your help. Is there also a way of
      calculating the row averages, excluding the first
      column?

        I believe there should be a way to calculate row averages. My poorly-named @f array contains all the values for a given row, except for the 1st column. So, its probably a matter of creating a "row_tot" variable and using the existing @col_num array. Try it out.
Re: Calculating Column Averages From A CSV File
by dwm042 (Priest) on Aug 10, 2007 at 16:03 UTC
    The first thing I'd say is please put your array loops inside your read loop. You gain a lot of efficiency that way. The solution I came up with was (and I'm probably abusing some variables):

    #!/bin/perl use warnings; use strict; package main; my @averages = (); my $count = 0; while (my $data = <DATA>) { my @cols = split /\,/, $data; $count++; if ( $count == 1 ) { @averages = @cols; } else { for( my $i = 1; $i < scalar @averages; $i++ ) { $averages[$i] += $cols[$i]; } } } for( my $i = 1; $i < scalar @averages; $i++ ) { print $averages[$i]/$count, "\n"; } __DATA__ IMAGINGNY,1.45,0.42,1.54,1.49,1.47,1.36,1.81,0.47,1.8,0.55,0.38 JBSQLTST,1.29,1.09,1.13,1.88,1.11,1.44,1.25,1.23,1.05,1.39,1.61 SNYCSQL,4.58,4.24,3.87,3.9,4.13,2.04,3.34,7.6,3.58,1.26,7.45 Snynetsrv,26,26.34,24.59,26.46,26.24,26.14,32.35,31.77,31.77,29.92,26. +59 W32SPLMCOR01,8.27,13.23,7.73,8.85,9.15,13.95,0,0,0,0,0 W32SDAPSCT01,3.07,3.14,2.97,3.28,21.65,54.23,3.16,3.02,3.26,2.77,3.40 W32SDASALM01,1.22,1.3,1.11,0.92,1.57,1.06,1.01,0.87,0.93,2.26,0.91 W32SMSCSD02,15.38,12,22.32,23.3,19.74,46.42,2.06,1.7,2.17,2.85,1.74 W32SPLMCOR02,13.24,13.23,7.73,8.85,9.15,13.95,0,0,0,0,0
    And the output is:
    C:\Code>perl columns.pl 8.27777777777778 8.33222222222222 8.11 8.77 10.4677777777778 17.8433333333333 4.99777777777778 5.18444444444444 4.95111111111111 4.55555555555556 4.67555555555556
    If you want to deal with irregular length CSV, you'll need some logic to handle the different counts per column. Something like a push of a $value, $count will be necessary.
Re: Calculating Column Averages From A CSV File
by technojosh (Priest) on Aug 10, 2007 at 15:21 UTC
    if the values you need to average are in @colval:

    my $howMany = @colval; my $total = 0; foreach my $val ( @colval ) { $total = $total + $val; } my $avg = $total / $howMany;

    I'm sure there's a 'shorter' way to do it, but i'm still getting my code minimalist skills up to par...

Re: Calculating Column Averages From A CSV File
by eric256 (Parson) on Aug 13, 2007 at 23:50 UTC

    Here is one way to do it. I scan over all of the data loading it into $data. Then scan over that storing the sum and count of non negative items per row and appending the average to the end of the row. While doing that it also stores the per column counts and sums in array refs. Then after everything is done it appends a new row with the averages for each column. It could probably be shrunk down a bit but i figured i'd start it out verbose for you.

    #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my $data; #read all data into an AofA while (<DATA>) { chomp; push @$data, [split (/,/, $_)]; } my $cols = 11; my $col_sum; my $col_count; my $cur_col; for my $row (@$data) { my $row_sum; my $row_count; my $cur_col = 1; for my $val (@$row[1..$cols]) { if ($val != 0) { $row_sum += $val; $row_count++; $col_sum->[$cur_col] += $val; $col_count->[$cur_col]++; } $cur_col++; } push @$row, $row_sum/ $row_count; } #print Dumper({col_sum=>$col_sum, col_count=>$col_count}); push @$data, ["TOTAL", map { $col_sum->[$_] / $col_count->[$_] } 1..$c +ols]; for my $row (@$data) { print join(",", @$row),"\n"; } __DATA__ IMAGINGNY,1.45,0.42,1.54,1.49,1.47,1.36,1.81,0.47,1.8,0.55,0.38 JBSQLTST,1.29,1.09,1.13,1.88,1.11,1.44,1.25,1.23,1.05,1.39,1.61 SNYCSQL,4.58,4.24,3.87,3.9,4.13,2.04,3.34,7.6,3.58,1.26,7.45 Snynetsrv,26,26.34,24.59,26.46,26.24,26.14,32.35,31.77,31.77,29.92,26. +59 W32SPLMCOR01,8.27,13.23,7.73,8.85,9.15,13.95,0,0,0,0,0 W32SDAPSCT01,3.07,3.14,2.97,3.28,21.65,54.23,3.16,3.02,3.26,2.77,3.40 W32SDASALM01,1.22,1.3,1.11,0.92,1.57,1.06,1.01,0.87,0.93,2.26,0.91 W32SMSCSD02,15.38,12,22.32,23.3,19.74,46.42,2.06,1.7,2.17,2.85,1.74 W32SPLMCOR02,13.24,13.23,7.73,8.85,9.15,13.95,0,0,0,0,0


    ___________
    Eric Hodges

      And a shorter version just for kicks:

      #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my $data; #read all data into an AofA sub avg { my $s; $s += $_ for @_; return $s / grep { $_ != 0 } @_; }; while (<DATA>) { chomp; push @$data, [@$_, avg( @$_[1..@$_-1])] for ([split(/,/, $_)]); } push @$data, ["TOTAL", map { my $col = $_; avg( map { $_->[$col] } @$ +data) } 1..11]; print join(",", @$_),"\n" for @$data; __DATA__ IMAGINGNY,1.45,0.42,1.54,1.49,1.47,1.36,1.81,0.47,1.8,0.55,0.38 JBSQLTST,1.29,1.09,1.13,1.88,1.11,1.44,1.25,1.23,1.05,1.39,1.61 SNYCSQL,4.58,4.24,3.87,3.9,4.13,2.04,3.34,7.6,3.58,1.26,7.45 Snynetsrv,26,26.34,24.59,26.46,26.24,26.14,32.35,31.77,31.77,29.92,26. +59 W32SPLMCOR01,8.27,13.23,7.73,8.85,9.15,13.95,0,0,0,0,0 W32SDAPSCT01,3.07,3.14,2.97,3.28,21.65,54.23,3.16,3.02,3.26,2.77,3.40 W32SDASALM01,1.22,1.3,1.11,0.92,1.57,1.06,1.01,0.87,0.93,2.26,0.91 W32SMSCSD02,15.38,12,22.32,23.3,19.74,46.42,2.06,1.7,2.17,2.85,1.74 W32SPLMCOR02,13.24,13.23,7.73,8.85,9.15,13.95,0,0,0,0,0

      And a final tiny squeeze:

      #!/usr/bin/perl use strict; use warnings; use List::Util qw/sum/; my $data; sub avg { return sum(@_) / grep { $_ != 0 } @_;}; while (<DATA>) { chomp; push @$data, [@$_, avg( @$_[1..@$_-1])] for ([split(/,/, $_)]); } push @$data, ["TOTAL", map { my $col = $_; avg( map { $_->[$col] } @$ +data) } 1..11]; print join(",", @$_),"\n" for @$data;

      ___________
      Eric Hodges