FloydATC has asked for the wisdom of the Perl Monks concerning the following question:

I'm using Net::SSH::Perl to run a long series of commands on a remote system, basically using 'ls -laR' and 'cat' to locate and transfer several hundred thousand files to the local system based on a complicated set of rules. (Yes I know about SCP but it must be done this way, trust me)

The running time is expected to be several days, and this is fine with me.

The problem is that after exactly one hour and thousands of cmd() calls, everything just stops. No error messages, no nothing, the call just never returns. Any ideas what might be causing this and what to look for when troubleshooting?

When restarting, the script knows how to pick up where it left. Everything works for another hour, and you get the idea. The hosts are on the same gigabit switch and IP subnet, files are transferred all the time so idle timeout is out of the question.

Replies are listed 'Best First'.
Re: Net::SSH::Perl hangs after 1 hour
by atcroft (Abbot) on May 24, 2009 at 00:51 UTC

    First of all, have you tried using the { debug => 1, } parameter on connecting to see what it gives you?

    Also, exactly 1 hour (3600 seconds) sounds familiar. Could it possibly involve the key exchange that occurs periodically?

    Just a few thoughts. HTH.

    Update (2009-05-23): In looking at the sshd_config file on one of my local machines, I found the following lines, which reinforce my thoughts about the key exchange interval (albeit commented out in that system):

    # Lifetime and size of ephemeral version 1 server key #KeyRegenerationInterval 1h #ServerKeyBits 768
    So I guess the next questions are: 1) are you using SSH-1, 2) what is the key exchange interval being used by the server, and 3) what does the debug output suggest? HTH.
      I'm forcing protocol version 2 since this is the only one enabled at the remote end. Is this issue exclusive for version 1?
      Debug results are in:
      05.24 08:29 /var/spool/imap/e/user/eler/10. => /home/vmail/eler/cur/10 +.ads-db20 dmz-webmail: channel 9530: new [client-session] dmz-webmail: Requesting channel_open for channel 9530. dmz-webmail: Entering interactive session. dmz-webmail: Sending command: cat "/var/spool/imap/e/user/eler/10." dmz-webmail: Requesting service exec on channel 9530. dmz-webmail: channel 9530: open confirm rwindow 0 rmax 32768 dmz-webmail: channel 9530: window 0 sent adjust 32768 dmz-webmail: channel 9530: window 0 sent adjust 32768 dmz-webmail: channel 9530: window 0 sent adjust 32768 dmz-webmail: Warning: ignore packet type 20
Re: Net::SSH::Perl hangs after 1 hour
by Khen1950fx (Canon) on May 24, 2009 at 02:46 UTC
    I'm inclined to agree with atcroft. You can set the KeyRegenerationInterval in the script:

    #!/usr/bin/perl use strict; use warnings; use diagnostics; use Net::SSH::Perl; my $host = 'host'; my $username = 'user'; my $password = 'password'; my $cmd = 'ls -laR /some/directory'; my $ssh = Net::SSH::Perl->new( $host, protocol => '1,2', debug => 1, options => ["KeyRegenerationInterval 1h"] ); $ssh->login( $username, $password ); $ssh->session_id; my ( $stdout, $stderr, $exit ) = $ssh->cmd($cmd); print $stdout, "\n";

      I have enabled debugging and key regeneration to see if it helps. It's 7 am on a sunday morning and I'm working from home... WTF ;-)
Re: Net::SSH::Perl hangs after 1 hour
by shmem (Chancellor) on May 23, 2009 at 22:05 UTC
    so idle timeout is out of the question

    How is "idle timeout" determined? The connection is bi-directional, and for both directions there's a sending and a receiving side, so you have four possibilities where a timeout might slam the door.

      I'm sending hundreds, if not thousands of commands per second and processing the results, this is why I consider timeouts a non-issue.
Re: Net::SSH::Perl hangs after 1 hour
by salva (Canon) on May 24, 2009 at 09:33 UTC
      Yes, if I can't get Net::SSH::Perl to work properly, a rewrite for Net::SSH2 seems like a logical next step. It's not like I'm married to this particular module.