troy99 has asked for the wisdom of the Perl Monks concerning the following question:

Wise and benevolent Monks. I have searched both Google and the archives for this problem, and do not find any wisdom there. I am using OpenSSH to establish multiple connections to multiple servers and kick off a script on each. Yes, the below code is establishing a connection from 'a' to 'a'. This is intended as 'a' needs to be treated just like the rest of the nodes in a distributed processing environment. Here is the simplified code: From server 'a', run a Perl script which contains:

sub main_script() { my @servers ('a', 'b'); my %con_opts; $con_opts{timeout} = 120; $con_opts{async} = 1; my $script = "some_script.pl"; my %opts; $opts{stdin_pipe} = 1; $opts{stdout_pipe} = 1; $opts{stderr_to_stdout} = 1; foreach @servers { $con_opts{host} = $_; $ssh{'SSH'} = Net::OpenSSH->new(%con_opts); ($ssh{'STDIN'}, $ssh{'STDOUT'}, undef, $ssh{'PID'}) = $ssh{'SSH'}->open_ex(\%opts, $script) or die "Error ".$ssh{$host}->error; } ## Waitpid loop to clean up terminating OpenSSH sessions }

On each of the remote machines, some_script.pl kicks off processes such as:

sub start_child() { my $pid = fork(); if(!defined $pid or $pid < 0) { # ... Fork error die(...); } if($pid > 0) { # In parent return; } # In child process exec("child script", args); }

Later, some_script.pl can issue a SIGTERM signal to all of the child processes it created under certain conditions:

foreach (@child_pids) { kill "SIGTERM", $_; }

And all children are cleaned as they exit

while(waitpid(...)) { # Reap the children }

And then the some_script.pl running on that server exits, and the OpenSSH session is closed.

Now, to the problem. When the SIGTERM signals are issued, if the some_script.pl running on 'b' completes first and the OpenSSH session for that server is closed, then the OpenSSH session running on 'a' also dies no matter if the child processes running on 'a' are completed or not (that some_script.pl running on 'a' is waiting for).

But, if the some_script.pl running on 'a' completes first, then the some_script.pl running on 'b' seems to complete as expected (and the OpenSSH sessions also behave as expected).

If the processes complete naturally (e.g. without any external signals), then the OpenSSH sessions do not terminate prematurely.

Why is the OpenSSH session for 'a' exiting prior to all the child processes being completed (that some_script.pl is left waiting for)?

Thank You Monks!

Here is the debug output from OpenSSH that shows the sessions exiting:

debug2: client_process_control: accepted tty 0, subsys 0, cmd /usr/lev +el1/sw/tmcvay/bin/PWL debug2: client_process_control: got fds stdin 6, stdout 7, stderr 8 debug2: fd 6 setting O_NONBLOCK debug2: fd 7 setting O_NONBLOCK debug2: fd 5 setting O_NONBLOCK debug1: channel 0: new [client-session] debug2: channel 0: send open debug2: callback start debug2: client_session2_setup: id 0 debug1: Sending environment. debug1: Sending env LANG = en_US.UTF-8 debug2: channel 0: request env confirm 0 debug1: Sending command: /path/bin/some_script.pl debug2: channel 0: request exec confirm 0 debug2: callback done debug2: channel 0: open confirm rwindow 0 rmax 32768 debug2: channel 0: rcvd adjust 2097152 debug1: client_input_channel_req: channel 0 rtype exit-status reply 0 debug2: channel 0: rcvd eof debug2: channel 0: output open -> drain debug2: channel 0: obuf empty debug2: channel 0: close_write debug2: channel 0: output drain -> closed debug2: channel 0: rcvd close debug2: channel 0: close_read debug2: channel 0: input open -> closed debug2: channel 0: send close debug2: channel 0: is dead debug2: channel 0: garbage collecting debug1: channel 0: free: client-session, nchannels 1 # open_ex: ['ssh','-O','exit','-T','-S','/home/.../.libnet-openssh-per +l/b-3018-562986','--','b'] debug1: Transferred: stdin 0, stdout 0, stderr 0 bytes in 194.6 second +s debug1: Bytes per second: stdin 0.0, stdout 0.0, stderr 0.0 debug1: Exit status -1 # _waitpid(3388) => pid: 3388, rc: # open_ex: ['ssh','-O','exit','-T','-S','/home/.../.libnet-openssh-per +l/a-3018-242049','--','a'] debug1: channel 0: free: client-session, nchannels 1 debug1: Transferred: stdin 0, stdout 0, stderr 0 bytes in 195.0 second +s debug1: Bytes per second: stdin 0.0, stdout 0.0, stderr 0.0 debug1: Exit status -1 # _waitpid(3389) => pid: 3389, rc: