in reply to Re: Net::OpenSSH killing script
in thread Net::OpenSSH killing script

Thanks. I posted here because I knew you (the author) frequented the site.

It could very well be how I've coded it as I'm no expert by any means, but I do tend to know enough to be dangerous. I have an array of ~40 computers that I loop through continuously. The code posted below is run on each of those computers. I have excluded some minor mundane code (such as writing data to a file) for the sake of brevity..

# Create connection my $SSH = Net:OpenSSH->( $IP, user => 'user', password => 'pass' ); if( !$SSH->error ) { # Is my process running? my $PROC = $SSH->capture( "ps aux | grep <process_name> | grep -v + grep | wc -l | sed 's/ *//'" ); if( $PROC == 0 ) { # 3rd party program is run here via perl system method # If it happens to fail, it has no adverse consequences on # the rest of the script. <run 3rd party program> if( <my above 3rd party output condition is met> ) { # Read File on remote machine ( $DATA, $ERR ) = $SSH->capture2( "cat <file on remote +machine" ); # Write/Append $DATA to loca file <write data to loca file> # Delete the remote file $SSH->system( "rm -f <remote file>" ); # And finally, since my remote process is not currently # running, spawn it on the remote machine $SSH-spawn( "./script.pl" ); } } } else { print $SSH-error }

I know all machines can communicate and the main machine that runs this script also has every machine in its known_hosts.

My only assumption is that since this happens for each of the 40 machines, numerous times a day, that OpenSSH (the program, not the module) is hitting some type of barrier due to the sheer # of connections being made causing it to fail

I have added the debug line you suggested and I will post back once it fails (hopefully today, but no later than tomorrow morning)

Replies are listed 'Best First'.
Re^3: Net::OpenSSH killing script
by salva (Canon) on Feb 03, 2011 at 11:17 UTC
    It seems that you are using spawn in order to create detached processes on the remote host, but it doesn't work that way.

    spawn forks a new local ssh process that continues running on the background until the remote process exits and you have to take care of reaping those ssh processes with waitpid, otherwise zombies will pile up and at some point the OS will refuse to fork new processes... and that is probably the reason for your script failing.

    The right way to do that is to run the remote command with nohup:

    $ssh->system("nohup ./script.pl &");
    Though, as it seems that the remote command is actually a Perl script, you can also convert it into a daemon letting it take the responsibility of going into the background. There are several CPAN modules that allow to do that (i.e. Proc::Daemon).

    Besides that, there are other places where you can improve your Net::OpenSSH usage:

    # Read File on remote machine ( $DATA, $ERR ) = $SSH->capture2( "cat <file on remote machine" ); # Write/Append $DATA to loca file <write data to loca file>
    ...can just be written as...
    $ssh->system({stdout_file => ['>>', $local_file]}, cat => $remote_file).
    And in...
    my $PROC = $SSH->capture( "ps aux | grep <process_name> | grep -v grep + | wc -l | sed 's/ *//'" ); if( $PROC == 0 ) {...
    you should check that the command did not fail due to some SSH error. Otherwise, you could end running several instances of ./script.pl.

      Just wanted to thank you for the assistance. Using...

      $ssh->system("nohup ./script.pl &");

      ...fixed my problem. You were correct in your assumption of how I thought 'spawn' worked. Thank you!!