Re^4: Mean and standard deviation loop

so this is were l am stuck, once l have the data into the @data array i am not sure how to work through this to to the stats calculation by matching the col1 and col2 ranges.

#!/usr/bin/perl -w
# -------------------------------------------------
use strict;
use Getopt::Long;
use Pod::Usage;
use File::Spec;
Getopt::Long::Configure ("bundling");
use Statistics::R;
# -------------------------------------------------
my $help = 0;
my $force = 0;
my $verbose = 0;
my $result = GetOptions(
    "help|h" =>\$help,
    "force|f"=>\$force, 
    "verbose|v"=>\$verbose,
    );
#--------------------------------------------------
# Assign input data to an arrray
#---------------------------------------------------
scalar (@ARGV) == 1 or die pod2usage(1);
my $fname = $ARGV[0];
my $fnameout = $fname;
$fnameout =~ s/\.\w\w\w$/_conv.csv/;
# -------------------------------------------------
(-e $fname) or die "Unable to find input file: $fname\n";
(-e $fnameout and $force) and die "Output file already exisys. Use -f 
+to force: $fnameout\n";
my ($fin, $fout);
open ($fin, "<$fname") or die "Unable to open file: $fname\n";
open ($fout, ">$fnameout") or die "Unable to open output file: $fnameo
+ut\n";
my @headers = qw (Col_1 Col_3 Mean St_Dev);
print $fout (join (",", @headers),"\n");
# -------------------------------------------------
# Chomp all data
# -------------------------------------------------
my $cnt = 0;
my @data=();
my $line;
while (defined($line=<$fin>))
{
    $cnt ++;
    next unless $cnt > 1;                # Skip Header
    chomp($line);
    next if $line =~ /^\s*$/;
    my @cols = split(',',$line);
         push @data,[@cols];
}
[download]

Comment on Re^4: Mean and standard deviation loop Download Code

Replies are listed 'Best First'.
Re^5: Mean and standard deviation loop by morgon (Priest) on Jun 17, 2012 at 13:48 UTC
You need to partition the data - that is you need to process the subsets given by "Col 1" and "Col 3" (i.e. name and id). So one way of doing this would be to use a hash that uses an array-ref to collect the data belonging to one partition: `my %partitions; while ( ... my @cols = split(',', $line); my $pname = "$cols[0]:$cols[2]"; $partitions{$pname} \|\|= []; push @{$partitions{$pname}}, $cols[3]; }` [download] Then you can later process these partitions to calculate e.g. the mean: `use List::Util qw(sum); for my $pname (sort keys %partitions) { my $count = scalar @{$partition{$pname}}; my $sum = sum @{$partition{$pname}}; my $mean = $sum/$count; # std dev is left as an exercise }` [download] This assumes that the data you want to process are in your "Col 4" only. If you need to include the other columns, simply push them also onto the array. BEWARE: Untested code!	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^5: Mean and standard deviation loop
by morgon (Priest) on Jun 17, 2012 at 13:48 UTC

So one way of doing this would be to use a hash that uses an array-ref to collect the data belonging to one partition:

  my %partitions;
  while (
  ...
  my @cols = split(',', $line);
  my $pname = "$cols[0]:$cols[2]";
  $partitions{$pname} ||= [];
  push @{$partitions{$pname}}, $cols[3];
  }
[download]

  use List::Util qw(sum);

  for my $pname (sort keys %partitions) {

     my $count = scalar @{$partition{$pname}};
     my $sum = sum @{$partition{$pname}};
     my $mean = $sum/$count;
     
     # std dev is left as an exercise
  }
[download]

BEWARE: Untested code!

[reply]
[d/l]
[select]