Technext has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to run few child processes on different platforms in parallel. Parent should only proceed further once all the child processes have completed on respective platforms.

The problem is that when I use fork and then run the ‘exec’ command in the child process, it ends almost instantly. Also, the output isn't consistent. Almost every time the log shows only one line.

-bash-2.05b$ cat Agent.SOLSPARC caught SIGTERM signal, cleaning up
or
-bash-2.05b$ cat Agent.SOLSPARC Host: EBSO9SPC Login: esm2
Sometimes, there are few extra lines and at last the message, 'Killed by signal 15'. The command that i use in 'exec' actually calls a script which connects to remote boxes and runs make command on them. For testing purpose, i am currently passing only one platform i.e., SOLSPARC. Also, i'm only interested in knowing whether a command finished on any given platform.

I was not sure whether I was passing all the arguments to ‘exec’ correctly so I tried different combinations (after referring different links on the Internet) but to no avail. One important observation is that when i used strace to debug this issue, the command worked fine. I saw in the perldoc that exec uses /bin/sh -c on Unix platforms, but varies on other platforms. Is it that exec and strace use different shell?

Here’s the relevant portion of my code:

sub compile { my %child_pids; foreach $plat (0 .. $#plat_list) { my $pid = fork; # Didn't check the undef condition for child if ($plat_list[$plat] eq "SOLSPARC") { print "\nStarted Solaris build \n"; if ($pid == 0) { print "Inside Child Process \n\n"; exec ( "${ROOT}/${REM_EXEC} -t 1200 -c \"make LANG=en_ +US distclean \" -b ${ROOT} -l Agent. $plat_list[$plat]" ) or die "exe +c failed"; } elsif ($pid > 0) { $child_pids{"SOLSPARC"} = $pid; } } else { print "\nStarted build for other platforms \n"; if ($pid == 0) { print "Inside Child Process \n\n"; exec ( "${ROOT}/${REM_EXEC} -t 1200 -c \"make LANG=en_ +GB clean \" -b ${ROOT} -l Agent. $plat_list[$plat]" ) or die "exec fa +iled"; } elsif ($pid > 0) { $child_pids{"$plat_list[$plat]"} = $pid; } } } my %rev_child_pids = reverse %child_pids; while ((my $kid = waitpid -1, WNOHANG) > 0) { if ($rev_child_pids{$kid} eq "SOLSPARC") { print "\nChild process completed for SOLARIS platform $rev +_child_pids{$kid} \n"; print "Run some other command here \n"; } else { print "\nChild process completed for other platform $rev_c +hild_pids{$kid} \n"; print "No more commands to run \n"; } } }

I tried the following two forms as well but still there is no change in output:

exec ( "${ROOT}/${REM_EXEC}", "-t 1200", "-c \"make LANG=en_US distclean \"", "-b ${ROOT}", "-l Agent.", "$plat_list[$plat]" ) or die "exec failed";

and

exec ( "${ROOT}/${REM_EXEC}", "-t", "1200", "-c", "\"make LANG=en_US distclean \"", "-b", "${ROOT}", "-l", "Agent.", "$plat_list[$plat]" ) or die "exec failed";

The script, ${REM_EXEC}, which runs the code on remote machine is pasted here: http://pastebin.com/j4MJgPPL

Any suggestions?

Replies are listed 'Best First'.
Re: Unable to 'exec'
by locked_user sundialsvc4 (Abbot) on Aug 17, 2012 at 12:32 UTC

    You might find it very useful to look at Parallel::ForkManager, or any of several others in the Parallel family.   There’s a lot of “gooey glue” that has to be dealt with in all programs like this, which you have to get exactly right but that is hard to get exactly right ... and these modules will do a lot of that grunt-work for you.

    Getting stuff to happen on multiple machines is a bit more complicated, but there are CPAN modules for that, too, so I am told.

      Thanks for your reply. I couldn't test this because i had no access to system during weekend.

      I had used Parallel::ForkManager earlier (in another script). I wanted to use fork here so never went for that. Anyways, i gave it a try right now. I get the same output. :(

      Am i wrong in assuming that the result would not have changed because i anyways had to use the same exec command here too between the first and third line given below?

      my $pid = $pm->start and next; ... do some work with $data in the child process ... $pm->finish; # Terminates the child process

        Apologies! I did change my script to try the module you suggested but while running, i was calling a different script :P Sorry for that.

        It's working with the Parallel::ForkManager module! :)

        Well, i am on the lookout for modules whenever i have some issue to fix. While writing this script, i thought let's leave modules and use the basics ('fork'). Looks like it was a wrong choice for me at least for this problem. :D Anyways, thanks a lot sundialsvc4! This issue was disturbing me for quite some time now. :)

Re: Unable to 'exec'
by aitap (Curate) on Aug 17, 2012 at 13:11 UTC
    or die "exec failed";
    Will it print anything useful it you use exec ... or die "exec failed: $!\n";?
    Sorry if my advice was wrong.
      Thanks. I tried using die but it didn't help. exec always passes successfully.