iamb has asked for the wisdom of the Perl Monks concerning the following question:

Greetings, monks. I'm having problems with the Net::SSH2 module, and I seek your wisdom.

I am trying to use the normal poll, read method to exec commands and read output/errors from them. But I see very odd behavior with my code on every platform I've tried it on (various linux and Solaris), so I have to assume the problem is my code. Here is an example script:

#!/usr/bin/perl use warnings; use strict; use Net::SSH2; use Time::HiRes qw/gettimeofday/; my $cmd_timeout = 10; # In 250ms polls my $ssh = Net::SSH2->new(); $ssh->connect('HOSTNAME'); $ssh->auth_publickey('USER', 'KEYFILE.pub', 'KEYFILE') or die "auth failed: ".$ssh->error(); $ssh->blocking(1); my $channel = $ssh->channel() or die "channel failed: ".$ssh->error(); $channel->exec('(for i in $(seq 1 10); do sleep 1 && echo line number +$i ; done)'); my @poll = ({handle => $channel, events => [qw/in ext/]}); $ssh->blocking(0); my $ttl = $cmd_timeout; my ($rv, %std); while (!$channel->eof()) { my $read = 0; $ssh->poll(500, [@poll]); my $buf; for my $poll (@poll) { print "got ", join(":", keys %{$poll->{revents}}), " poll events\n +"; for my $ev (qw/in ext/) { next unless $poll->{revents}{$ev}; print "found event $ev\n"; if (my $r = $channel->read($buf, 8192, $ev eq 'ext')) { $read += $r; $std{$ev} .= $buf; } } } print "read is $read and ttl is $ttl and time is ", join(".", gettim +eofday()), "\n"; if ($read) { $ttl = $cmd_timeout; } else { --$ttl; } if ($ttl < 1) { print "I got a timeout :(\n"; last; } } $channel->wait_closed(); $rv = $channel->exit_status(); $channel->close(); @std{qw/in ext/} = map { ($std{$_} || '') } qw/in ext/; print "in is $std{in} ext is $std{ext} rv is $rv\n"

As you can see, I expect to implement a timeout on channels by counting 250ms polls that silently tick by...if I see more than 10 in a row, time out the connection. For testing purposes, I exec a command which will output a line of text once per second. It should not time out.

Now the strange thing is the output:

got value poll events read is 0 and ttl is 10 and time is 1264118862.120617 got value poll events read is 0 and ttl is 9 and time is 1264118862.624583 got in:value poll events found event in read is 14 and ttl is 8 and time is 1264118862.711258 got value poll events read is 0 and ttl is 10 and time is 1264118863.212584 got value poll events read is 0 and ttl is 9 and time is 1264118863.716592 got in:value poll events found event in read is 0 and ttl is 8 and time is 1264118863.735174 got in:value poll events found event in read is 0 and ttl is 7 and time is 1264118863.735380 got in:value poll events found event in read is 0 and ttl is 6 and time is 1264118863.735554 got in:value poll events found event in read is 0 and ttl is 5 and time is 1264118863.735787 got in:value poll events found event in read is 0 and ttl is 4 and time is 1264118863.735959 got in:value poll events found event in read is 0 and ttl is 3 and time is 1264118863.736132 got in:value poll events found event in read is 0 and ttl is 2 and time is 1264118863.736328 got in:value poll events found event in read is 0 and ttl is 1 and time is 1264118863.736500 I got a timeout :( in is line number 1 ext is rv is 0

The output shows that at first it's dandy. It polls for 250ms until there's output, reads it in, and starts 250ms polls again. Then when it gets a second poll response, the call to $ssh->read() gets nothing (though trying an $ssh->flush() here correctly returns the number of bytes, and clears the handle for the next poll!). It then continues to poll and return immediately, but fail to get anything from ->read().

Then more wierdness...if the timeout is removed (comment the call to last on line 38), it will enter a crazy loop of polling while the command runs on the remote host. It loops around poll, sees something on in, reads nothing from in, polls again, and so on for the 10 seconds that the remote command runs. If you let it run this way (and burn in your cpu), sure enough every so often the call to ->read() actually gets the data waiting. It pulls in something like 3 lines of input at a time. This makes me think it's some sort of input buffering, but I've no clue where it's happening. Another thing that makes me think buffering (and the reason for using the HiRes timer) is that when the read() call succeeds, it falls nearly right on the one second "tics" since starting the remote command. At almost exactly 10 seconds, a last read() will get the remaining data.

I first used this method without the timeout code, and didn't realize there was a problem because as explained in the previous paragraph, without the timeout stuff it completes normally and in a reasonable time. It's just the fact that if it's waiting on output, it enters this crazy poll/read loop. I don't want slowness of a remote host to hose me up.

More weirdness...I tried an if $r = read() { ... } else { usleep(250); } -- it stops burning up the cpu and only polls once per 250ms again, but then read() will only ever get the first line! It will also continue trying to poll/read forever (does not end after 10 seconds as it should). It's as if I have to hammer out the millions of read() calls to get anything from them at all.

I've been puzzling on this for a while. Please, if you're able, enlighten me. I will be happy to provide more information if requested.

I have worked around the problem for now by implementing the timeout by setting snd and rcv timeouts on the ssh socket. I would really like to get the poll/read working though, as I have plans to run multiple channels in parallel. It's also just plain bugging me. :)

Replies are listed 'Best First'.
Re: Net::SSH2 odd polling behavior
by salva (Canon) on Jan 22, 2010 at 09:19 UTC
    Linux, Solaris... if you are familiar with system programing under Unix, Net::OpenSSH will be far more easier to use (and BTW, reliable :-)

    On Solaris you will have to install the OpenSSH client.

    #!/usr/bin/perl use strict; use warnings; use Time::HiRes qw(time); use Net::OpenSSH; use Error; my $start = time; my $ssh = Net::OpenSSH->new($host, user => $user, master_opts => [-i => $key_file]); warn time - $start . ": connected!\n"; my $timeout = 0.250; my $output = ''; my $ofh = $ssh->pipe_out('(for i in $(seq 1 10); do sleep 1 && echo li +ne number $i ; done)'); my $fn = fileno $ofh; while (1) { my $rb = ''; vec($rb, $fn, 1) = 1; if (select($rb, undef, undef, $timeout) > 0) { my $bytes = sysread($ofh, $output, 1024, length $output); last if (!$bytes and (defined $bytes or $! != Error::EINTR())); warn time - $start . ": $bytes bytes read\n"; } else { warn time - $start . ": timeout!\n"; } } close($ofh); print "output:\n$output\n";

      I tried same thing to implement, a channel timeout, to clean shutdown the ssh2 connection, in a case of a backend failure.

      Probable the timeout should be implemented on the ssh2 socket level rather than the channel itself. I would be curious to see how the author of the thread, iamb, implemented the timeout setting snd and rcv timeouts on the ssh socket.

      At the moment I am using setsockopt($sock, SOL_SOCKET, SO_KEEPALIVE, 1)

      if($ssh2->connect($dest, $port)) { my $sock = $ssh2->sock; setsockopt($sock, SOL_SOCKET, SO_KEEPALIVE, 1) or die "fatal: cant set socket in SO_KEEPALIVE mode $!"; ... }