kosie99 has asked for the wisdom of the Perl Monks concerning the following question:

I was wondering what would be the best way to do the following: I have a script that I want to run on a number of files in a directory - once for each file found in parallel. When ALL these files are processed, I want a subsequent script to kick off. E.g:

for $f in $files { system("proc_script $f &"); } system("next_script");

What would be the best way to make sure all the proc_script processes completed fully before kicking off next_script? Thanks!

Replies are listed 'Best First'.
Re: Best way to synchronise scripts
by Anonymous Monk on Jan 07, 2014 at 08:37 UTC
Re: Best way to synchronise scripts
by Anonymous Monk on Jan 07, 2014 at 14:23 UTC
    Also: this is generally the purview of "workload managers" such as, in the commercial world, e.g. Tivoli. There are open-source batch workload managers out there, too. It might be more convenient overall to be able to control and to synchronize the execution of scripts externally to any and all of them, and without writing a custom-script to do this rather commonly requested thing. (Recovery plans, e.g. when a particular script fails, can also be handled easily with such systems.)
Re: Best way to synchronise scripts
by Tanktalus (Canon) on Jan 09, 2014 at 05:07 UTC

    "Best"?

    What may be best for me may not be best for you. Heck, I've done that in at least three different ways, none of which are likely to be considered "best" by at least a handful of others, probably many others.

    I've done forking/execing/waiting by hand. I've used Parallel::ForkManager (in the child you can simply exec your proc_script and then $pm->wait_all_children after you're done forking them all off). Most recently, I've adopted AnyEvent to manage this. And I'm looking at AnyEvent::Fork and friends (e.g., AnyEvent::Fork::RPC) for the future.

    Proc::JobQueue also looks interesting, though it might be a bit heavy for what you're doing.

    These all have pros and cons, so it really depends on what you want to do. I suspect that Parallel::ForkManager might be the easiest solution that I'm familiar with for Doing The Right Thing for you. Because you don't really want to fork off a separate script for each file all at the same time. Parallel::ForkManager is one solution (of at least a few, I'm sure) that can make it easy to ensure you don't have too many going at a time by placing a limit on how many forks are running at a time.