He is using a grand total of 2 wget processes.

True enough. (I had glossed over that part of the script.)

I stronly doubt that using Archive::Rar... is going to be a win

Agreed.

He can save mv processes by using xargs.

Most of his sed filters can be condensed.

This is where I'm doubtful -- maybe xargs can support something like what the OP's script is doing, but frankly I think a simple perl script could do it more cogently, and could replace all the sed filtering as well; again, one perl process working on a list of thousands of files will be a win over large numbers of mv and sed jobs, even if xargs is helping.

Unless a person is really expert at shell, sed, xargs, etc, while being really new to Perl, I'd think using Perl here would be fruitful and worth the time spent. And on taking the first step, it may be worthwhile to consider how Perl scripting could provide other optimizations that might be hard to acheive in shell scripting.

Update: I seem to be contradicting tilly's estimates about the overall impact of process management. I'll stick by that, based partly on the evidence in my "rename vs. mv" test, and on other experience I've had (on Solaris, as it happens), where I altered a perl script from doing something like this:

my @cksums; my @files = `find $path -type f`; # apologies to etcshadow chomp @files; push @cksums, `cksum $_` for ( @files );
to doing something like this, which produces the same result:
my @cksums = `find $path -type f | xargs cksums`;
The difference was dramatic. In that case, a lot of the overhead was presumably due to starting lots of shell processes, each one running just one cksum, which probably makes it an "unfair" comparison. Still, it was dramatic.

I decided to retest on my macosx laptop, in a directory that includes lots of software distributions: nearly 12,000 data files, and lots of these are very small -- but not all of them: total space consumed is 10 5 GB (oops- forgot the "-k" flag on du). To make it less lopsided, I compared these -- in the order shown (in case there was an advantage to going second):

time perl -e '@cksums = `find . -type f -print0 | xargs -0 cksum`' time perl -e '$/="\x0"; open(I,"find . -type f -print0 |" ); open(SH,"|/bin/sh"); while (<I>) { chomp; print SH "cksum \"$_\" > /dev/null\n" }'
The version with xargs took 7 minutes 5 sec; the version with 12,000 cksum processes run within a single shell (doing slightly more work in perl, but not trying to store the results anywhere) took 14 minutes 13 sec. I'd have to attribute most of the difference to process management issues, and I think there's something missing in tilly's estimates.

In reply to Re^3: BASH vs Perl performance by graff
in thread BASH vs Perl performance by jcoxen

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.