in reply to Efficient way to sum columns in a file
I generated 500,000 lines of random CSV with this script
#!/usr/bin/perl use strict; use warnings; # create source numbers if they don't exist my $many = 500000; my $source='numbers.csv'; open CSV, '>', $source or die "can't write to $source: $!\n"; for (1..$many) { my @numbers; push @numbers, (int rand 1000) for (1..5); print CSV join ",",@numbers; print CSV $/; }
Then I tried a few one liners to sum the columns, I ran each twice and post the second timing to allow for cache
nph>time cat numbers.csv | perl -nle'@d=split /,/;$a[$_]+=$d[$_] for ( +0..4);END{print join "\t", @a}' 249959157 249671314 249649377 250057435 249420 +634 real 0m17.10s user 0m15.46s sys 0m0.08s nph>time perl -nle'my @d=split /,/;$a[$_]+=$d[$_] for (0..4);END{print + join "\t", @a}' numbers.csv 249959157 249671314 249649377 250057435 249420 +634 real 0m13.71s user 0m12.77s sys 0m0.04s nph>time perl -nle'my($a,$b,$c,$d,$e)=split /,/;$ta+=$a, $tb+=$b, $tc+ +=$c, $td+=$d, $te+=$e;END{print join "\t", $ta,$tb,$tc,$td,$te}' numb +ers.csv 249959157 249671314 249649377 250057435 249420 +634 real 0m6.45s user 0m5.91s sys 0m0.07s
The last one was consistently faster after several attempts with it and the second.
Cheers,
R.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Efficient way to sum columns in a file
by sk (Curate) on Apr 13, 2005 at 18:12 UTC | |
|
Re^2: Efficient way to sum columns in a file
by Roy Johnson (Monsignor) on Apr 13, 2005 at 20:40 UTC |