Hello Monks,
I am working (still) on using the Net::SSH::Expect module to connect to a list of servers, gather some information, and then exit, eventually sending a report via email. Part of this report is a list of servers that could not be connected to via ssh. I am noticing that several servers on my list I can connect to manually from the same host, using the same user (ssh keys installed -- no password necessary), and every time I run the script, the subset of these servers changes. I've tried experimenting with the expect object's timeout option, as well as the ssh timeout, and I've made improvements on this list (fewer servers that fall into this category) but I'm still seeing some, and I'm trying to figure out why.
Here is the object I'm creating:
my $ssh = Net::SSH::Expect->new (
binary => "/usr/local/bin/ssh",
host => "$serverlist[$host]",
user => "$user",
raw_pty => 1,
log_file=> "/tmp/$serverlist[$host].log",
timeout => 4,
ssh_option => "-o ConnectTimeout=8",
);
Now I am running this from a Solaris 10 x86 host, so the default ssh was Sun's own brand. I installed the OpenSSH package from sunfreeware.com so I could utilize the ConnectTimeout option (hence the specified binary). I seem to have better results with this, but as I mentioned before, I'm still not getting all of the servers I should be able to connect to. I feel it's an issue of timeouts, but I'm not sure where the breakdown is.
The output that I have is too extensive to post here, but basically I have a list of 460 servers. I should be able to connect to about 440 of them, yet I consistently connect to 420-430 of them, with that list of 10-20 servers changing each time. A manual connection always succeeds.
Thanks for any help you can provide.
Update:
I should mention that I am using Parallel::ForkManager to process these, so it is done much faster. When I do this with max_procs = 1, I have no issues. I'm not aware of a maximum limit on outgoing ssh connections (I know incoming there is sometimes a limit of 10 unauthenticated ssh connections on Solaris), so I'm still not sure where the problem is. I've tried with max_procs = 5, all the way up to 20, and the problem still exists in all cases.