jamesgerard1964 has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to use Parallel::ForkManager to run an external script to about 200 servers. I am currently running on a Linux server running RH. The external script is kicked off via an ssh command. The server hangs just about everyday and has to be rebooted to recover. Can someone tell me if the external command should be kicked off with 'system', 'exec', or 'backticks'. I don't need to wait on any output from the script. Currently I'm using the backticks.


my $pm = Parallel::ForkManager->new( 10 ); $SIG{ALRM} = sub { die ("TimeOut"); }; eval { alarm( 300 ); foreach my $server (@servers) { my $pid = $pm->start and next; my $fqdn = "$server.$domain"; my $status = `ssh -o UserKnownHostsFile=/dev/null -o StrictH +ostKeyChecking=no $id\@$fqdn cat <$script \"|\" 2>/dev/null $int - -- +fromhost $whoami`; sleep 4; $pm->finish; } $pm->wait_all_children(); alarm(0); }; # end of eval

Replies are listed 'Best First'.
Re: Parallel::ForkManager and possible memory leak
by salva (Canon) on Feb 17, 2015 at 16:27 UTC
    The server hangs

    How? in which way?

    Calling alarm may leak memory, but not so much and so fast as to hang a server. In the other hand, you may be leaving lots of zombie processes behind.

    Try moving the alarm(300) code after the line calling $pm->start, so that it runs on the children.

    Also, add some print statements here and there in order to see what your script is doing.

    Update: oh, and BTW, you may like to check my module Net::OpenSSH::Parallel!

Re: Parallel::ForkManager and possible memory leak
by bitingduck (Deacon) on Feb 17, 2015 at 16:41 UTC

    Does the server not hang if you don't run the script? Are there logs that you can sift through to see what's happening when the server hangs and if it's consistent in some way?

      When I say hangs it is not accessible. Cannot login, scp, sftp, anything. It is pingable but that is all. I have print statements in the code, I didn't post them, I just wanted to be sure the code portion is correct. This hanging issue has only happened once I started the fork process. If I remove the Parallel::ForkManager and just let the script run one server after the other, then it runs fine, but it takes well over 5 minutes to complete. I have a snippet of logging from yesterday where the server became unusable. I have print statements that say when the program starts, when the child processes have ended. When we hit the timeout condition it appears that the server had issues.



      02/16/15 10:34:03 Program SendRemote_PullMetrics Started

      Mon Feb 16 10:36:28 2015: Waiting For All Child To Finish

      Mon Feb 16 10:36:34 2015: All processes finished.

      02/16/15 10:36:44 Program SendRemote_PullAPIMetrics Completed



      02/16/15 10:39:05 Program SendRemote_PullMetrics Started

      Mon Feb 16 10:41:39 2015: Waiting For All Child To Finish

      Mon Feb 16 10:41:45 2015: All processes finished.

      02/16/15 10:41:58 Program SendRemote_PullAPIMetrics Completed



      02/16/15 10:44:07 Program SendRemote_PullMetrics Started

      TimeOut Over 3 Minutes to Complete All Servers, Continuing...


      Never got a complete message and then the Server became unusable.

Re: Parallel::ForkManager and possible memory leak
by stonecolddevin (Parson) on Feb 17, 2015 at 17:20 UTC

    You really need to look into https://code.google.com/p/parallel-ssh/ for this in my opinon.

    Three thousand years of beautiful tradition, from Moses to Sandy Koufax, you're god damn right I'm living in the fucking past