I was watching someone trying to find the sum of a particular column in a file. He was taking it a data-application (meant to work with large files) to calculate this sum.
It takes a while to setup the app as you have to read the entire file and give it variable names etc. So i wrote this one liner which worked great (saved a lot of time).
cut -d, -f7 in.csv| perl -nle '$sum += $_; print ("Sum = $sum") if eof +;'
I could avoid the "cut" but i didn't see a huge advantage (if someone can give me good reasons to avoid cut that will be nice!)....
Please note that the file is pretty large (around 5 million rows and a few hundred columns)... Since it worked out well, that person asked me how to modify the code to make it work for 5 columns. I immediately used an array (return from a split /,/)and looped through the list to get the sum of the columns every time a new row is sent in! Little did I realize at the time of writing that this will have horrible performance... After letting it run for a few minutes I realized that looping many times (millions!) is not such a good idea (bad idea rather?)... Maybe just declaring 5 variables would have been better...
So my question is how would Monks handle such a problem?
Thanks all for your time!
-SK
In reply to Efficient way to sum columns in a file by sk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |