sk has asked for the wisdom of the Perl Monks concerning the following question:
I was watching someone trying to find the sum of a particular column in a file. He was taking it a data-application (meant to work with large files) to calculate this sum.
It takes a while to setup the app as you have to read the entire file and give it variable names etc. So i wrote this one liner which worked great (saved a lot of time).
cut -d, -f7 in.csv| perl -nle '$sum += $_; print ("Sum = $sum") if eof +;'
I could avoid the "cut" but i didn't see a huge advantage (if someone can give me good reasons to avoid cut that will be nice!)....
Please note that the file is pretty large (around 5 million rows and a few hundred columns)... Since it worked out well, that person asked me how to modify the code to make it work for 5 columns. I immediately used an array (return from a split /,/)and looped through the list to get the sum of the columns every time a new row is sent in! Little did I realize at the time of writing that this will have horrible performance... After letting it run for a few minutes I realized that looping many times (millions!) is not such a good idea (bad idea rather?)... Maybe just declaring 5 variables would have been better...
So my question is how would Monks handle such a problem?
Thanks all for your time!
-SK
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Efficient way to sum columns in a file
by dave0 (Friar) on Apr 13, 2005 at 05:08 UTC | |
by sk (Curate) on Apr 13, 2005 at 05:44 UTC | |
|
Re: Efficient way to sum columns in a file
by tlm (Prior) on Apr 13, 2005 at 11:55 UTC | |
by Anonymous Monk on Apr 13, 2005 at 13:48 UTC | |
by tlm (Prior) on Apr 13, 2005 at 14:04 UTC | |
|
Re: Efficient way to sum columns in a file
by Random_Walk (Prior) on Apr 13, 2005 at 12:58 UTC | |
by sk (Curate) on Apr 13, 2005 at 18:12 UTC | |
by Roy Johnson (Monsignor) on Apr 13, 2005 at 20:40 UTC | |
|
Re: Efficient way to sum columns in a file
by eibwen (Friar) on Apr 13, 2005 at 06:51 UTC |