in reply to fast way to split and sum an input file

perl was extremely slow and it is wiser to use the shell.
I find that hard to believe. Perl is slower than C, but if you're dealing with string manipulation, it's one of the fastest "scripting languages" around. I wouldn't be too surprised if a small awk script could do this task a bit faster than perl - awk was written for exactly this kind of task.

Anyway, you're pitching straight C against perl in your example - I don't see any awk or shell script - and for your particular code, it's not that surprising that the C code is faster - though I'd guess (blindly) that for this kind of task 1/5th of the speed of a C program is attainable.

But your C code isn't equivalent to the perl code: AFAIK the C code just reads a 10,000 bytes, reads the 100th field in those 10.000 bytes and gets the next 10,000 bytes. If I read it correctly, it'll even fail to get the 100th field on the first line if the very first field is empty (my C is rusty) update: my C is indeed rusty: you code will work correctly if all lines are less than 10K in length, and they don't start with an empty field. Your perl code on the other hand reads the file by line and gets the 100th field for that line. As others have stated in this thread: one of the benefits of using perl vs C is that it just takes a lot less time to code a correct program in perl vs C - and computers don't get paid by the hour :-)

  • Comment on Re: fast way to split and sum an input file