SixShot has asked for the wisdom of the Perl Monks concerning the following question:

Hi All, Please help l am still a Perl novice and cant get my head around this loop l need to write in order to create a new text file with the mean and standard deviation from ranges in a csv file. Here is an example of the file:

Col1,Col2,Col3,Col4,Col5,Col6 Name 1,IGNORE,ID 1,0.37982135,0.791818317,0.652424768 Name 1,IGNORE,ID 1,0.721513387,0.881438903,0.460097013 Name 1,IGNORE,ID 1,0.804540751,0.30103416,0.307076201 Name 1,IGNORE,ID 1,0.244148341,0.665789239,0.073428582 Name 1,IGNORE,ID 2,0.425204974,0.830647697,0.026222533 Name 1,IGNORE,ID 2,0.64290764,0.506440565,0.593350606 Name 1,IGNORE,ID 2,0.299279085,0.279361006,0.639529638 Name 1,IGNORE,ID 2,0.696625882,0.910134661,0.462723154 Name 1,IGNORE,ID 2,0.775977962,0.972299177,0.574419934 Name 1,IGNORE,ID 2,0.145839987,0.785302378,0.856054519 Name 1,IGNORE,ID 2,0.380588138,0.184563257,0.614053111 Name 1,IGNORE,ID 2,0.657711661,0.960401631,0.687047718 Name 1,IGNORE,ID 3,0.50094279,0.244414627,0.027866195 Name 1,IGNORE,ID 3,0.247070699,0.329813346,0.72012615 Name 1,IGNORE,ID 3,0.528465719,0.519431152,0.380683376 Name 1,IGNORE,ID 3,0.001334767,0.772583548,0.283821376 Name 1,IGNORE,ID 3,0.035444487,0.177265159,0.65665088 Name 1,IGNORE,ID 3,0.390519351,0.614277413,0.115720898 Name 1,IGNORE,ID 3,0.290907022,0.623986491,0.114367918 Name 1,IGNORE,ID 3,0.960698899,0.282528688,0.789410834 Name 2,IGNORE,ID 1,0.52029931,0.217956687,0.034778849 Name 2,IGNORE,ID 1,0.390883709,0.433090757,0.834412219 Name 2,IGNORE,ID 1,0.969661816,0.256131736,0.382298203 Name 2,IGNORE,ID 1,0.182221598,0.50228435,0.273815182 Name 2,IGNORE,ID 1,0.155900495,0.164124636,0.121096495 Name 2,IGNORE,ID 1,0.877336374,0.673715671,0.381463711 Name 2,IGNORE,ID 2,0.132989121,0.275819298,0.522132919 Name 2,IGNORE,ID 2,0.566069681,0.778573666,0.577410397 Name 2,IGNORE,ID 2,0.861289328,0.751731334,0.895456821 Name 2,IGNORE,ID 2,0.601763435,0.054547388,0.422025303 Name 2,IGNORE,ID 2,0.239323207,0.026933112,0.345341484 Name 2,IGNORE,ID 2,0.785210274,0.031020763,0.043997356 Name 2,IGNORE,ID 2,0.849328727,0.159601291,0.006723201 Name 2,IGNORE,ID 2,0.882972537,0.839234269,0.11965987 Name 2,IGNORE,ID 2,0.114967039,0.467529515,0.325535197 Name 2,IGNORE,ID 3,0.219825579,0.776021711,0.887957411 Name 2,IGNORE,ID 3,0.615890229,0.846215519,0.808795755 Name 2,IGNORE,ID 3,0.382284625,0.649516417,0.164323894 Name 2,IGNORE,ID 3,0.932202153,0.954362672,0.709015286 Name 2,IGNORE,ID 3,0.159515113,0.417265005,0.14523904 Name 2,IGNORE,ID 3,0.57268037,0.289845554,0.052907578 Name 2,IGNORE,ID 3,0.236739875,0.844689238,0.728153225 Name 2,IGNORE,ID 3,0.335484964,0.608410584,0.165027002 Name 2,IGNORE,ID 3,0.115758212,0.377994434,0.22737545 Name 2,IGNORE,ID 3,0.58695053,0.192963069,0.48771977 Name 2,IGNORE,ID 4,0.501302506,0.379030488,0.870477352 Name 2,IGNORE,ID 4,0.808307454,0.842780413,0.786997964 Name 2,IGNORE,ID 4,0.269026636,0.284492031,0.773855038 Name 2,IGNORE,ID 4,0.700413158,0.10286498,0.523411249 Name 2,IGNORE,ID 4,0.118227472,0.284542899,0.793854806 Name 2,IGNORE,ID 4,0.512343377,0.319557477,0.53841837 Name 2,IGNORE,ID 4,0.819964968,0.879054472,0.765823587 Name 2,IGNORE,ID 4,0.86335164,0.475113054,0.627558806 Name 2,IGNORE,ID 4,0.750244582,0.613032699,0.182777923 Name 2,IGNORE,ID 4,0.275165823,0.021067739,0.612086553 Name 2,IGNORE,ID 4,0.226488906,0.635489023,0.747917162 Name 2,IGNORE,ID 4,0.488732866,0.587405833,0.412799804 Name 2,IGNORE,ID 4,0.596252248,0.56288024,0.923117578 Name 2,IGNORE,ID 4,0.315232094,0.905375958,0.640880331 Name 2,IGNORE,ID 4,0.419892207,0.501447283,0.841059496 Name 2,IGNORE,ID 5,0.027954045,0.603952859,0.179546933 Name 2,IGNORE,ID 5,0.203437218,0.832374684,0.332714406 Name 2,IGNORE,ID 5,0.492690906,0.490600928,0.718792343 Name 2,IGNORE,ID 5,0.761181274,0.866338049,0.315125996 Name 2,IGNORE,ID 5,0.968684085,0.465075264,0.453525826 Name 2,IGNORE,ID 5,0.965926406,0.511947021,0.63041369 Name 2,IGNORE,ID 5,0.830650173,0.145010257,0.854201445 Name 2,IGNORE,ID 5,0.740655024,0.496291265,0.965611823 Name 2,IGNORE,ID 5,0.594504215,0.320318337,0.460041343 Name 2,IGNORE,ID 5,0.469701381,0.346615537,0.729423429 Name 2,IGNORE,ID 5,0.364647905,0.912741612,0.200696706 Name 2,IGNORE,ID 5,0.650916143,0.143698107,0.71461251 Name 2,IGNORE,ID 5,0.236520771,0.846877843,0.47651797 Name 2,IGNORE,ID 5,0.83696323,0.886732329,0.7868661 Name 2,IGNORE,ID 5,0.048179463,0.302861896,0.942740508 Name 2,IGNORE,ID 5,0.561141223,0.157787635,0.921377274 Name 2,IGNORE,ID 6,0.054753633,0.496822999,0.426184382 Name 2,IGNORE,ID 6,0.392287968,0.089318893,0.410990992 Name 2,IGNORE,ID 6,0.219582733,0.287663836,0.033101737 Name 2,IGNORE,ID 6,0.666103181,0.505495999,0.728609282 Name 2,IGNORE,ID 6,0.997576553,0.135051446,0.523997905 Name 2,IGNORE,ID 6,0.440280551,0.320183615,0.134193205 Name 2,IGNORE,ID 6,0.46226184,0.208013033,0.865425765 Name 2,IGNORE,ID 6,0.869363826,0.301666045,0.763282357 Name 2,IGNORE,ID 6,0.986298107,0.665474133,0.924482868 Name 2,IGNORE,ID 6,0.130406143,0.924405695,0.390965551 Name 2,IGNORE,ID 6,0.941576004,0.40262737,0.232998427 Name 2,IGNORE,ID 6,0.557452214,0.826768069,0.009160847 Name 2,IGNORE,ID 6,0.713443893,0.167423963,0.880938869 Name 2,IGNORE,ID 6,0.070524667,0.223616125,0.402114973 Name 2,IGNORE,ID 6,0.210950729,0.127250272,0.659798497 Name 3,IGNORE,ID 1,0.331583378,0.05338106,0.128834081 Name 3,IGNORE,ID 1,0.690538179,0.789946028,0.945991209 Name 3,IGNORE,ID 1,0.165157277,0.998774239,0.0455421 Name 3,IGNORE,ID 1,0.605826929,0.080406226,0.126499074 Name 3,IGNORE,ID 1,0.956784451,0.311633382,0.47904799 Name 3,IGNORE,ID 1,0.032108824,0.914251623,0.465807797 Name 3,IGNORE,ID 1,0.209457389,0.803181549,0.359003069 Name 3,IGNORE,ID 1,0.53235704,0.855123438,0.304031395 Name 3,IGNORE,ID 2,0.976087028,0.327735368,0.529816747 Name 3,IGNORE,ID 2,0.678639311,0.704008305,0.186024057 Name 3,IGNORE,ID 2,0.95579316,0.946794771,0.093978138 Name 3,IGNORE,ID 2,0.535135211,0.616836963,0.555452335 Name 3,IGNORE,ID 2,0.353519383,0.743495309,0.73279098 Name 3,IGNORE,ID 2,0.692094099,0.376559244,0.008859322 Name 3,IGNORE,ID 2,0.36585992,0.378740255,0.7876411 Name 3,IGNORE,ID 2,0.013224478,0.496042768,0.034245886 Name 3,IGNORE,ID 2,0.376288745,0.562422774,0.889485667 Name 3,IGNORE,ID 2,0.507718223,0.413729714,0.893780487 Name 3,IGNORE,ID 2,0.086020289,0.575655064,0.478651028 Name 3,IGNORE,ID 3,0.714735222,0.061739536,0.417562291 Name 3,IGNORE,ID 3,0.831754648,0.223515306,0.070781154 Name 3,IGNORE,ID 3,0.231560448,0.075462932,0.196199016 Name 3,IGNORE,ID 3,0.888036581,0.657571916,0.020372209 Name 3,IGNORE,ID 3,0.500038109,0.735243357,0.21473527 Name 3,IGNORE,ID 3,0.269553886,0.92740133,0.579772528 Name 3,IGNORE,ID 3,0.262827679,0.690383144,0.779901605 Name 3,IGNORE,ID 3,0.560589794,0.828025582,0.507173871 Name 3,IGNORE,ID 3,0.02545088,0.057554697,0.533111624 Name 3,IGNORE,ID 3,0.181576318,0.943676799,0.133749761 Name 3,IGNORE,ID 3,0.202273952,0.117622017,0.614573059 Name 3,IGNORE,ID 3,0.658081544,0.396384996,0.251594415 Name 3,IGNORE,ID 3,0.666397989,0.627433713,0.192909205 Name 3,IGNORE,ID 4,0.834378245,0.93129209,0.482303893 Name 3,IGNORE,ID 4,0.435969447,0.964872927,0.409713174 Name 3,IGNORE,ID 4,0.315667797,0.741551553,0.724828033 Name 3,IGNORE,ID 4,0.441599787,0.387304774,0.932125435 Name 3,IGNORE,ID 4,0.132473891,0.397548401,0.769881268 Name 3,IGNORE,ID 4,0.656655881,0.491286593,0.736728362 Name 3,IGNORE,ID 4,0.714934678,0.939271723,0.424204648 Name 3,IGNORE,ID 4,0.28152302,0.238264583,0.403745356 Name 3,IGNORE,ID 4,0.397335519,0.694688244,0.995862278 Name 3,IGNORE,ID 4,0.803515204,0.016659858,0.686058043 Name 3,IGNORE,ID 4,0.382836019,0.898355202,0.525285486 Name 3,IGNORE,ID 4,0.71194337,0.525643652,0.563393929 Name 3,IGNORE,ID 4,0.425912809,0.950649598,0.83262041 Name 3,IGNORE,ID 5,0.107861783,0.090576382,0.810290528 Name 3,IGNORE,ID 5,0.812035455,0.362919345,0.603067695 Name 3,IGNORE,ID 5,0.875468795,0.288728237,0.716083531 Name 3,IGNORE,ID 5,0.583141247,0.089452692,0.847372685 Name 3,IGNORE,ID 5,0.747409815,0.203152854,0.316762478 Name 3,IGNORE,ID 5,0.148171719,0.305170065,0.002037187 Name 3,IGNORE,ID 5,0.534605619,0.120470186,0.01753885 Name 3,IGNORE,ID 5,0.570773853,0.58967248,0.55075633 Name 3,IGNORE,ID 5,0.213932106,0.748859437,0.017916466 Name 3,IGNORE,ID 5,0.672609412,0.75729189,0.874194686 Name 3,IGNORE,ID 5,0.078913513,0.295277146,0.9605599 Name 3,IGNORE,ID 6,0.161652468,0.256321818,0.291330684 Name 3,IGNORE,ID 6,0.768664648,0.807839215,0.664323076 Name 3,IGNORE,ID 6,0.855204581,0.606592189,0.710802554 Name 3,IGNORE,ID 6,0.282669948,0.193352094,0.714680507 Name 3,IGNORE,ID 6,0.809227298,0.814022818,0.983258816 Name 3,IGNORE,ID 6,0.325573758,0.510825554,0.609916485 Name 3,IGNORE,ID 6,0.374756314,0.997788561,0.184902868 Name 3,IGNORE,ID 6,0.351667219,0.217427335,0.134593906 Name 3,IGNORE,ID 6,0.872674608,0.533825578,0.752141807 Name 3,IGNORE,ID 6,0.321759984,0.798189175,0.645437855 Name 3,IGNORE,ID 6,0.599191186,0.220917242,0.277175086 Name 3,IGNORE,ID 6,0.852745332,0.271589722,0.861608693 Name 3,IGNORE,ID 6,0.628306658,0.852902302,0.520219421 Name 3,IGNORE,ID 7,0.324223654,0.376985921,0.552674926 Name 3,IGNORE,ID 7,0.162620802,0.209952181,0.066167776 Name 3,IGNORE,ID 7,0.688614864,0.674588849,0.343567297 Name 3,IGNORE,ID 7,0.897478871,0.699221782,0.697716567 Name 3,IGNORE,ID 7,0.958406136,0.95946889,0.257582265 Name 3,IGNORE,ID 7,0.66486043,0.315839631,0.265555319 Name 3,IGNORE,ID 7,0.744613767,0.842190349,0.089507978 Name 3,IGNORE,ID 7,0.90787385,0.754901563,0.015983226 Name 3,IGNORE,ID 7,0.463885541,0.98461792,0.11082385 Name 3,IGNORE,ID 7,0.506738303,0.945769999,0.017478177 Name 3,IGNORE,ID 7,0.269568318,0.263827257,0.851695483 Name 3,IGNORE,ID 7,0.138290466,0.802143341,0.983498242 Name 3,IGNORE,ID 7,0.607150506,0.025864193,0.551843394 Name 3,IGNORE,ID 8,0.500603368,0.400331981,0.791633065 Name 3,IGNORE,ID 8,0.605062432,0.067761939,0.367149629 Name 3,IGNORE,ID 8,0.329244606,0.410084524,0.276165148 Name 3,IGNORE,ID 8,0.171468104,0.452224236,0.355038138 Name 3,IGNORE,ID 8,0.50592024,0.482206447,0.248042776 Name 3,IGNORE,ID 8,0.352207588,0.771484409,0.517947243 Name 3,IGNORE,ID 8,0.608753402,0.197817955,0.849418686 Name 3,IGNORE,ID 8,0.584675616,0.4401131,0.522428859 Name 3,IGNORE,ID 8,0.362588613,0.245743931,0.3609012 Name 3,IGNORE,ID 8,0.997325869,0.52338963,0.864007116

I need to loop through this file and calculate the mean and standard deviation for each Name and ID pair (cols 1 and 3). So as to produce an output like this:

Col1,Col3,mean ,standard Dev Name 1,ID 1,VALUE ?,VALUE ? Name 1,ID 2,VALUE ?,VALUE ? Name 1,ID 3,VALUE ?,VALUE ? Name 2,ID 1,VALUE ?,VALUE ? Name 2,ID 2,VALUE ?,VALUE ? Name 2,ID 3,VALUE ?,VALUE ? Name 2,ID 4,VALUE ?,VALUE ? Name 2,ID 5,VALUE ?,VALUE ? Name 2,ID 6,VALUE ?,VALUE ? Name 3,ID 1,VALUE ?,VALUE ? Name 3,ID 2,VALUE ?,VALUE ? Name 3,ID 3,VALUE ?,VALUE ? Name 3,ID 4,VALUE ?,VALUE ? Name 3,ID 5,VALUE ?,VALUE ? Name 3,ID 6,VALUE ?,VALUE ? Name 3,ID 7,VALUE ?,VALUE ? Name 3,ID 8,VALUE ?,VALUE ?

Here is what l have so far:

#!/usr/bin/perl -w # ------------------------------------------------- use strict; use Getopt::Long; use Pod::Usage; use File::Spec; Getopt::Long::Configure ("bundling"); # ------------------------------------------------- my $help = 0; my $force = 0; my $verbose = 0; my $result = GetOptions( "help|h" =>\$help, "force|f"=>\$force, "verbose|v"=>\$verbose, #-------------------------------------------------- # Assign input data to an arrray #--------------------------------------------------- scalar (@ARGV) == 1 or die pod2usage(1); my $fname = $ARGV[0]; my $fnameout = $fname; $fnameout =~ s/\.\w\w\w$/_conv.csv/; # ------------------------------------------------- (-e $fname) or die "Unable to find input file: $fname\n"; (-e $fnameout and $force) and die "Output file already exisys. Use -f +to force: $fnameout\n"; my ($fin, $fout); open ($fin, "<$fname") or die "Unable to open file: $fname\n"; open ($fout, ">$fnameout") or die "Unable to open output file: $fnameo +ut\n"; my @headers = qw (Col_1 Col_3 Mean St_Dev); print $fout (join (",", @headers),"\n"); # ------------------------------------------------- # Chomp all data # ------------------------------------------------- my @data = <$fin>; # ------------------------------------------------- my $cnt = 0; foreach my $line (@data) { $cnt ++; next unless $cnt > 1; # Skip Header chomp($line); }

I would really appreciate your help with this as my mind has got stuck on this loop. And l cant seem to work out whether to use an array or hash. Many thanks Monks!!

Replies are listed 'Best First'.
Re: Mean and standard deviation loop
by roboticus (Chancellor) on Jun 17, 2012 at 13:32 UTC

    SixShot:

    When your input data is sorted, I normally approach it like this:

    # Compute average value for each group my ($prev_key, $cnt, $sum); foreach my $line (@data) { my ($key, $val) = parse_line($line); if (defined($prev_key) and $prev_key eq $key) { # Same group as last time, accumulate results ++$cnt, $sum += $val; } else { # New group starting. Print results from # previous group, then start new one. print "Group $prev_key avg=", $sum/$cnt, ", cnt=", $cnt, "\n"; $prev_key = $key, $cnt=1, $sum=$val; } } # Print results from last group. print "Group $prev_key avg=", $sum/$cnt, ", cnt=", $cnt, "\n";

    And when you're data isn't sorted, you can always sort it first!

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

      Hi roboticus, Thanks for your help, can l ask what the parse_line subroutine is?

        SixShot:

        Sure, it's just a routine to pull out the value(s) you're interested in. In your case, you could probably use split and adjust the variable names on the left to represent the value(s) you want. You can use 'undef' for the columns holding values you don't care about, or just give them names. I frequently do the latter for documentation (so I don't have to look up the file format to figure out what a particular column is in case I want to modify the program).

        ...roboticus

        When your only tool is a hammer, all problems look like your thumb.

Re: Mean and standard deviation loop
by morgon (Priest) on Jun 17, 2012 at 11:41 UTC
    Just dumping all your data here may be an efficient way for you to get your problems solved by someone else, but it isn't for the community.

    At the very least you should make an attempt yourself.

    I am willing to discuss code, not someone else's data-dump.

      ,p>ahh apologies, l totally understand what you mean. Here is the start of my scripy

      #!/usr/bin/perl -w # ------------------------------------------------- use strict; use Getopt::Long; use Pod::Usage; use File::Spec; Getopt::Long::Configure ("bundling"); # ------------------------------------------------- my $help = 0; my $force = 0; my $verbose = 0; my $result = GetOptions( "help|h" =>\$help, "force|f"=>\$force, "verbose|v"=>\$verbose, #-------------------------------------------------- # Assign input data to an arrray #--------------------------------------------------- scalar (@ARGV) == 1 or die pod2usage(1); my $fname = $ARGV[0]; my $fnameout = $fname; $fnameout =~ s/\.\w\w\w$/_conv.csv/; # ------------------------------------------------- (-e $fname) or die "Unable to find input file: $fname\n"; (-e $fnameout and $force) and die "Output file already exisys. Use -f +to force: $fnameout\n"; my ($fin, $fout); open ($fin, "<$fname") or die "Unable to open file: $fname\n"; open ($fout, ">$fnameout") or die "Unable to open output file: $fnameo +ut\n"; my @headers = qw (Col_1 Col_3 Mean St_Dev); print $fout (join (",", @headers),"\n"); # ------------------------------------------------- # Chomp all data # ------------------------------------------------- my @data = <$fin>; # ------------------------------------------------- my $cnt = 0; foreach my $line (@data) { $cnt ++; next unless $cnt > 1; # Skip Header chomp($line); }

      its just the input section, l really am stuck on how to loop over this my programmers mind is still quite novice im afraid and l am still struggling on working through loops in my mind.

        That is not an attempt to solve the problem at hand.

        It is just boiler-plate code (that you may simply have copied).

        Are you claiming you know how to read a file into an array but not how to loop in Perl?

        Your problem is to parse the data then partition the data and compute mean and std dev on the paritions.

        Before I provide any more feedback I would like to see an attempt there as otherwise I have the feeling you are simply trying to free-ride.

        This site is the Perl-monks, not the homework-monks.

Re: Mean and standard deviation loop
by Cristoforo (Curate) on Jun 17, 2012 at 15:51 UTC
    This could also easily be solved using DBI to calculate the averages. If you are familiar with SQL, this could be a possible solution.
    #!/usr/bin/perl use strict; use warnings; use Text::Table; use DBI; my $dbh = DBI->connect("DBI:CSV:"); $dbh->{'csv_tables'}->{'data'} = { 'file' => 'junk.txt'}; my $sql = <<SQL; select Col1, Col3, AVG(Col4), AVG(Col5), AVG(Col6) from data group by Col1, Col3 order by Col1, Col3 SQL my $sth = $dbh->prepare( $sql ); $sth->execute; my $tb = Text::Table->new(qw/ Col1 Col3 Avg_Col4 Avg_Col5 Avg_Col6 /); while ( my $row = $sth->fetchrow_arrayref ) { $tb->load($row); } print $tb; __END__ C:\Old_Data\perlp>perl t.pl Col1 Col3 Avg_Col4 Avg_Col5 Avg_Col6 Name 1 ID 1 0.53750595725 0.66002015475 0.373256641 Name 1 ID 2 0.503016916125 0.6786437965 0.556675151625 Name 1 ID 3 0.36942296675 0.445537553 0.386080953375 Name 2 ID 1 0.516050550333333 0.3745506395 0.337977443166667 Name 2 ID 2 0.559323705444444 0.376110070666667 0.362031394222222 Name 2 ID 3 0.415733165 0.5957284203 0.4376514411 Name 2 ID 4 0.5109963958 0.492942305933333 0.669402401266667 Name 2 ID 5 0.547109591375 0.5205764764375 0.605138018875 Name 2 ID 6 0.5141908028 0.378785432866667 0.492416377133333 Name 3 ID 1 0.440476683375 0.600837193125 0.356844589375 Name 3 ID 2 0.503670895181818 0.558365503181818 0.471884158818182 Name 3 ID 3 0.460990542307692 0.487847332692308 0.347110462153846 Name 3 ID 4 0.502672743615385 0.629029938307692 0.652826947307692 Name 3 ID 5 0.485902119727273 0.350142792181818 0.519689121454545 Name 3 ID 6 0.554161077076923 0.544737969461538 0.565414750615385 Name 3 ID 7 0.564178885230769 0.604259375076923 0.369545730769231 Name 3 ID 8 0.5017849838 0.3991158152 0.515273186

    Chris

Re: Mean and standard deviation loop
by cheekuperl (Monk) on Jun 17, 2012 at 12:57 UTC
    I need to loop through this file and calculate the mean and standard deviation for each Name and ID pair (cols 1 and 3).
    #Pseudocode/pseduoalgo - whatever you call it @arr=<FILE>; #slurp the file into array for $line (@arr) #Loop through the file { #Do you need to chomp the line? #Do you need to split the line on comma delimiter? #Do you need a mean calculation formula here? #Do you need a std dev formula here? #Do you need to print the calculated values in new file? }