in reply to Re: fast way to split and sum an input file
in thread fast way to split and sum an input file

#1 - This is a totally arbitrary BS test.
#2 - The right tool for the job argument has no effect gainst people who don't like the language.
#3 - The original "challenge" was:
Take 1 million records.
They will contain numbers and be comma separated and have at least 100 fields.
Sum the hundredth field.

The so-called expert got it to run 3 seconds in C, 3 seconds in Awk, and 3 MINUTES in Perl.

In my historical experience from about 10 years ago, after being a C programmer for 10 years already, most text processing in Perl was quite close to C (timing wise), while being MUCH easier to write and debug.

So of course, I called BS. And had to test.

The test file is about 486MB.
The test system is a Quad Opteron - 2.2Ghz.
Data is in Linux cache memory during test.

My C code (just optimized, pulled the 1st index out and replaced it with a pointer loop walking the string counting commas) is now down to 1.766 seconds.

My simple 2 line AWK is 1 minutes 7 seconds - I say the original AWK statement is a total crock, maybe got confused in whisper down the lane.

My Perl code using 5.8 was 54 seconds, and the using 5.6.1 ran in 17 seconds. Good reason to keep the 5.6.1 around.

Of course, after coding in C for quite a while, and moving to Perl 10 years ago, I KNOW this is a crock, but for our very large data warehousing code (with lots of text munging that can run for days) we occasionally need to determine if the code effort is worth the trade off.

So at this point the difference for this simple test is 10 to 1. I suspect if I had a more text focused task, a bit more complex, a bit hard to optimize in C, to tradeoff would lean to Perl for both speed of execution AND development.

Replies are listed 'Best First'.