Re^2: BASH vs Perl performance

He is using a grand total of 2 wget processes. Hardly a reason to switch to LWP.

I stronly doubt that using Archive::Rar which has to mediate between C and Perl data structures is going to be a win over using an external binary for a simple uncompression.

He can save mv processes by using xargs.

Most of his sed filters can be condensed.

Granted, a mediocrely written shell script is going to be much slower than a mediocrely written Perl script, but for the tasks it's doing, shell seems like a more than decent tool.

Makeshifts last the longest.

Comment on Re^2: BASH vs Perl performance

Replies are listed 'Best First'.
Re^3: BASH vs Perl performance by graff (Chancellor) on Aug 11, 2004 at 04:43 UTC
He is using a grand total of 2 wget processes. True enough. (I had glossed over that part of the script.) I stronly doubt that using Archive::Rar... is going to be a win Agreed. He can save mv processes by using xargs. Most of his sed filters can be condensed. This is where I'm doubtful -- maybe xargs can support something like what the OP's script is doing, but frankly I think a simple perl script could do it more cogently, and could replace all the sed filtering as well; again, one perl process working on a list of thousands of files will be a win over large numbers of mv and sed jobs, even if xargs is helping. Unless a person is really expert at shell, sed, xargs, etc, while being really new to Perl, I'd think using Perl here would be fruitful and worth the time spent. And on taking the first step, it may be worthwhile to consider how Perl scripting could provide other optimizations that might be hard to acheive in shell scripting. Update: I seem to be contradicting tilly's estimates about the overall impact of process management. I'll stick by that, based partly on the evidence in my "rename vs. mv" test, and on other experience I've had (on Solaris, as it happens), where I altered a perl script from doing something like this: my @cksums; my @files = `find $path -type f`; # apologies to etcshadow chomp @files; push @cksums, `cksum $_` for ( @files ); [download] to doing something like this, which produces the same result: my @cksums = `find $path -type f \| xargs cksums`; [download] The difference was dramatic. In that case, a lot of the overhead was presumably due to starting lots of shell processes, each one running just one cksum, which probably makes it an "unfair" comparison. Still, it was dramatic. I decided to retest on my macosx laptop, in a directory that includes lots of software distributions: nearly 12,000 data files, and lots of these are very small -- but not all of them: total space consumed is 10 5 GB (oops- forgot the "-k" flag on du). To make it less lopsided, I compared these -- in the order shown (in case there was an advantage to going second): time perl -e '@cksums = `find . -type f -print0 \| xargs -0 cksum`' time perl -e '$/="\x0"; open(I,"find . -type f -print0 \|" ); open(SH,"\|/bin/sh"); while (<I>) { chomp; print SH "cksum \"$_\" > /dev/null\n" }' [download] The version with xargs took 7 minutes 5 sec; the version with 12,000 cksum processes run within a single shell (doing slightly more work in perl, but not trying to store the results anywhere) took 14 minutes 13 sec. I'd have to attribute most of the difference to process management issues, and I think there's something missing in tilly's estimates.	[reply] [d/l] [select]
Re^4: BASH vs Perl performance by tilly (Archbishop) on Aug 11, 2004 at 14:25 UTC
Back of the envelope estimates vs benchmarks. As long as you're benchmarking the right thing, the benchmark is always worth more. ;-)	[reply]