in reply to BASH vs Perl performance

I'd be surprised if a straightforward rewrite bought you major performance gains. Let's sanity check. You end up with 20,000 files. If you have to start 5 processes per file, then you're launching 100,000 processes. If it takes 0.001 seconds to launch a process, that is 100 seconds of improvement from removing that overhead. Even if it takes an absurd 0.01 seconds per process, that is only 15% of your total time. Probably not worth it. Update: But you should still run a benchmark to see what the overhead really is for you, it may be much larger than I'm estimating..

However your tasks all look heavily I/O bound. I/O tends to lend itself well to parallelization. It would take a lot more work, but if you made good use of something like Parallel::ForkManager to parallelize the work, you could get big wins. Suppose that you found that you could run 4 processes at once without them interfering with each other. If you rewrote the whole thing to take advantage of that, then your 2 hour job drops to 30 minutes!

You'll have to benchmark to find where you hit the point of diminishing returns from parallelizing, but I'd consider only being able to benefit from 4 processes at once to be a disappointing gain. But before you start having visions of being able to run 8 or 16 processes at once, note that you undoubtably spend at least a little bit of time doing non-parallelizable work. Time spent with, for instance, a remote connection saturated on bandwidth is not going to go away when you parallelize.

So it will take more work than you were planning on, but a rewrite should be able to achieve significant performance gains. But only if you look for the performance gains in a different place than you were looking.

UPDATE graff's benchmark at Re^3: BASH vs Perl performance suggests that the overhead for launching a process is much higher than I'd have thought. On the order of 0.035 seconds per process on his laptop. If that holds true on the hardware that you're running, stopping launching processes could be worth a lot more performance than I would have thought.

Replies are listed 'Best First'.
Re^2: BASH vs Perl performance
by jcoxen (Deacon) on Aug 11, 2004 at 16:20 UTC
    I looked over the info for Parallel::ForkManager and graff's benchmarks...and my system is fairly I/O bound. Looks like it's time for a rewrite. Best case, I get a big speed increase. Worst case, I learn new stuff. Either way, I have fun.

    Thanks to everyone for the thoughts and comments.

    Jack