in reply to Re^3: BASH vs Perl performance
in thread BASH vs Perl performance
You are missing the point. ...Actually, I would interpret my results to mean that a noticeable amount of excess time may be taken up by the OS doing process management. If I understood jcoxen's problem correctly, the shell script version, which is taking 2 hours, is running many short/simple processes on each file in a set of thousands of files. That's a lot of processes, even if they are just "mv" and "cp" and "sed" and other basic, low-footprint utilities. When you do many thousands of these simple little processes in rapid succession, you can really start to notice how heavy a load process management can be when it's pushed to the limit.Unfortunately, they kind of pale in comparison to the 2 hours runtime the script currently takes…
Is it worth going to any lengths to take 3 minutes off the runtime of a 2-hour job? Hardly.
I'm suggesting that the sheer quantity of processes being run by the OP's shell script is a major factor in the total time it takes -- it's likely that this is the point. (I'm assuming jcoxen had some evidence for deciding that most of the time was not being taken up by downloading the rar files, but rather in the subsequent shuffling/editing of thousands of data files.)
My test involved a relatively small-scale comparison -- 3000 quick/simple processes vs. 1 perl process; extrapolating from that to a bigger task involving (let me guess) 300,000 quick/simple processes vs. 100 perl processes, I would expect the time savings to be proportional: still nearly 2 to 1, but on the scale of hours instead of seconds.
I wasn't trying to present a specific solution for the given task, or to assert that perl will always be better/faster than a shell script -- I just wanted to highlight the impact of running way too many processes.
|
|---|