However your tasks all look heavily I/O bound. I/O tends to lend itself well to parallelization. It would take a lot more work, but if you made good use of something like Parallel::ForkManager to parallelize the work, you could get big wins. Suppose that you found that you could run 4 processes at once without them interfering with each other. If you rewrote the whole thing to take advantage of that, then your 2 hour job drops to 30 minutes!
You'll have to benchmark to find where you hit the point of diminishing returns from parallelizing, but I'd consider only being able to benefit from 4 processes at once to be a disappointing gain. But before you start having visions of being able to run 8 or 16 processes at once, note that you undoubtably spend at least a little bit of time doing non-parallelizable work. Time spent with, for instance, a remote connection saturated on bandwidth is not going to go away when you parallelize.
So it will take more work than you were planning on, but a rewrite should be able to achieve significant performance gains. But only if you look for the performance gains in a different place than you were looking.
UPDATE graff's benchmark at Re^3: BASH vs Perl performance suggests that the overhead for launching a process is much higher than I'd have thought. On the order of 0.035 seconds per process on his laptop. If that holds true on the hardware that you're running, stopping launching processes could be worth a lot more performance than I would have thought.
In reply to Re: BASH vs Perl performance
by tilly
in thread BASH vs Perl performance
by jcoxen
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |