Descriptive Stats from .csv file

korsmo has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Descriptive Stats from .csv file by kevbot (Vicar) on Jan 30, 2014 at 06:10 UTC
I have found the Data::Table to be a nice module for these types of tasks. It borrows some ideas from R that are helpful for handling tabular data (it has an implementation of the melt and cast functions inspired by the R reshape module). You can accomplish your task with the following code (assuming that your data is saved in a file named data.csv). The author of Data::Table has also made additional documentation/information available here: https://sites.google.com/site/easydatabase/ Data in data.csv: `Month,Zone,Replicate,SpeciesA,SpeciesB,SpeciesC Sept,1,1,5,10,15 Sept,1,2,0,5,10 Sept,1,3,5,0,5 Sept,2,1,5,5,5 Sept,2,2,10,15,10 Sept,2,3,0,0,5` [download] #!/usr/bin/env perl use strict; use warnings; use Data::Table; use Statistics::Lite qw(mean stddev); my $dt = Data::Table::fromCSV("data.csv"); print "Original Data Table\n"; print "===================\n"; print $dt->tsv; print "\n\n"; my $melt = $dt->melt(['Month', 'Zone', 'Replicate']); print "Melt Table\n"; print "==========\n"; print $melt->tsv; print "\n\n"; my $cast_mean = $melt->cast( ['Month', 'Zone'], 'variable', Data::Table::STRING, 'value', \&mean ); print "Cast (mean)\n"; print "===========\n"; print $cast_mean->tsv; print "\n\n"; my $cast_stddev = $melt->cast( ['Month', 'Zone'], 'variable', Data::Table::STRING, 'value', \&stddev ); print "Cast (stddev)\n"; print "=============\n"; print $cast_stddev->tsv; exit; [download] This code will give the following output: Read more... (1497 Bytes)	[reply] [d/l] [select]
Re^2: Descriptive Stats from .csv file by korsmo (Initiate) on Feb 03, 2014 at 18:14 UTC
Hi, That seemed to work well for obtaining the values I need - thanks! I have loaded the module Excel::Writer::XLSX to try to bring those values into a .xlsx file. Do you know if I need to put the cast_means and cast_stddev values into an array before I can write the .xlsx file? Thanks again, BK	[reply]
Re^3: Descriptive Stats from .csv file by kevbot (Vicar) on Feb 08, 2014 at 06:48 UTC
You can use the Data::Table::Excel module to put the contents of a Data::Table object into an Excel file. From the documentation, This perl package provide utility methods to convert between an Excel file and Data::Table objects. It then enables you to take advantage of the Data::Table methods to further manipulate the data and/or export it into other formats such as CSV/TSV/HTML, etc. For example, this code will create an xlsx file that contains both the mean and stddev tables: #!/usr/bin/env perl use strict; use warnings; use Data::Table; use Data::Table::Excel qw(tables2xlsx); use Statistics::Lite qw(mean stddev); my $dt = Data::Table::fromCSV("data.csv"); my $melt = $dt->melt(['Month', 'Zone', 'Replicate']); my $cast_mean = $melt->cast( ['Month', 'Zone'], 'variable', Data::Table::STRING, 'value', \&mean ); my $cast_stddev = $melt->cast( ['Month', 'Zone'], 'variable', Data::Table::STRING, 'value', \&stddev ); tables2xlsx("descriptive_stats.xlsx", [ $cast_mean, $cast_stddev ]); exit; [download]	[reply] [d/l]
Re: Descriptive Stats from .csv file by Anonymous Monk on Jan 30, 2014 at 03:26 UTC
See Re: Perl File Column Comparison. and Re: Need to extract first n rows and parsing multiple files and hash to count occurences, they both links examples of reading csv .... so copy/paste that part, then accumulate whatever sums and counts you need in a hash, say `$foo{ $id } += $val` ... Then , after you're done reading processing accumulating in a hash, you've got a hash of number, apply your statistics functions and print the info Good luck	[reply] [d/l]
Re: Descriptive Stats from .csv file by talexb (Chancellor) on Jan 30, 2014 at 14:39 UTC
I would load the data into a database (sqlite is a great choice) and have the database do the work. Alex / talexb / Toronto Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.	[reply]
Re^2: Descriptive Stats from .csv file by CountZero (Bishop) on Jan 30, 2014 at 20:55 UTC
Or work through DBD::CSV to use those CSV tables directly. CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James My blog: Imperial Deltronics	[reply]
Re^3: Descriptive Stats from .csv file by Anonymous Monk on Jan 31, 2014 at 12:45 UTC
Have you tried it?	[reply]
Re^4: Descriptive Stats from .csv file by Tux (Canon) on Jan 31, 2014 at 14:09 UTC
Re^5: Descriptive Stats from .csv file by Anonymous Monk on Jan 31, 2014 at 22:54 UTC
Some notes below your chosen depth have not been shown here
Re^2: Descriptive Stats from .csv file by vincent_veyron (Beadle) on Jan 31, 2014 at 11:39 UTC
If you have postgresql on hand, then it's as easy as : create table data( month text, zone integer, replicate integer, speciesa integer, speciesb integer, speciesc integer); copy data from stdin with delimiter ','; sept,1,1,5,10,15 Sept,1,2,0,5,10 Sept,1,3,5,0,5 Sept,2,1,5,5,5 Sept,2,2,10,15,10 Sept,2,3,0,0,5 \. select month, zone, replicate, avg(speciesa), stddev(speciesa) from da +ta group by month,zone, replicate; month \| zone \| replicate \| avg \| stddev -------+------+-----------+------------------------+-------- Sept \| 1 \| 2 \| 0.00000000000000000000 \| Sept \| 2 \| 3 \| 0.00000000000000000000 \| Sept \| 1 \| 3 \| 5.0000000000000000 \| Sept \| 2 \| 2 \| 10.0000000000000000 \| sept \| 1 \| 1 \| 5.0000000000000000 \| Sept \| 2 \| 1 \| 5.0000000000000000 \| (6 lignes) [download] libremen.com : legal case, contract and insurance claim management software	[reply] [d/l]