You are missing the point. ...

Unfortunately, they kind of pale in comparison to the 2 hours runtime the script currently takes…

Is it worth going to any lengths to take 3 minutes off the runtime of a 2-hour job? Hardly.

Actually, I would interpret my results to mean that a noticeable amount of excess time may be taken up by the OS doing process management. If I understood jcoxen's problem correctly, the shell script version, which is taking 2 hours, is running many short/simple processes on each file in a set of thousands of files. That's a lot of processes, even if they are just "mv" and "cp" and "sed" and other basic, low-footprint utilities. When you do many thousands of these simple little processes in rapid succession, you can really start to notice how heavy a load process management can be when it's pushed to the limit.

I'm suggesting that the sheer quantity of processes being run by the OP's shell script is a major factor in the total time it takes -- it's likely that this is the point. (I'm assuming jcoxen had some evidence for deciding that most of the time was not being taken up by downloading the rar files, but rather in the subsequent shuffling/editing of thousands of data files.)

My test involved a relatively small-scale comparison -- 3000 quick/simple processes vs. 1 perl process; extrapolating from that to a bigger task involving (let me guess) 300,000 quick/simple processes vs. 100 perl processes, I would expect the time savings to be proportional: still nearly 2 to 1, but on the scale of hours instead of seconds.

I wasn't trying to present a specific solution for the given task, or to assert that perl will always be better/faster than a shell script -- I just wanted to highlight the impact of running way too many processes.


In reply to Re^4: BASH vs Perl performance by graff
in thread BASH vs Perl performance by jcoxen

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.