in reply to Re^17: Net::OpenSSH loosing lines ins reply
in thread Net::OpenSSH loosing lines ins reply

Good morning Salva,


I think you lead us to the correct track... :-)


open OUT, ">", "./trace.out"; my $line =0; foreach (@cmdout) { $line++; my $bytes = print $line.": ".$_; print OUT "$line: bytes: $bytes, err: $! \n"; } close(OUT);


IF everything is fine trace looks like this for all lines:
... 1880: bytes: 1, err: Inappropriate ioctl for device 1881: bytes: 1, err: Inappropriate ioctl for device 1882: bytes: 1, err: Inappropriate ioctl for device 1883: bytes: 1, err: Inappropriate ioctl for device 1884: bytes: 1, err: Inappropriate ioctl for device ...
IF it starts to fail:
... 1891: bytes: 1, err: Inappropriate ioctl for device 1892: bytes: 1, err: Inappropriate ioctl for device 1893: bytes: , err: Resource temporarily unavailable 1894: bytes: , err: Resource temporarily unavailable 1895: bytes: , err: Resource temporarily unavailable 1896: bytes: , err: Resource temporarily unavailable 1897: bytes: , err: Resource temporarily unavailable ...


Does this ring a bell?
update caught it in strace:
write(1, "07651738\n1313: switchport acce"..., 4096) = 4096 write(1, ": interface Ethernet2/36\n1426: "..., 4096) = 4096 write(1, "ntrusted\n1539: service-policy "..., 4096) = 4096 write(1, "torm-control broadcast level 10."..., 4096) = 4096 write(1, "rol broadcast level 10.00\n1782: "..., 4096) = -1 EAGAIN (Re +source temporarily unavailable) write(1, "t level 10.00\n1894: service-po"..., 4096) = -1 EAGAIN (Re +source temporarily unavailable) write(1, "e-policy type queuing input N7K-"..., 4096) = -1 EAGAIN (Res +ource temporarily unavailable) write(1, "141: channel-group 1305 mode a"..., 4096) = -1 EAGAIN (Res +ource temporarily unavailable) write(1, "storm-control broadcast level 10"..., 4096) = -1 EAGAIN (Res +ource temporarily unavailable)

Replies are listed 'Best First'.
Re^19: Net::OpenSSH loosing lines ins reply
by salva (Canon) on Jun 05, 2014 at 08:55 UTC
    write(1, "torm-control broadcast level 10."..., 4096) = 4096 write(1, "rol broadcast level 10.00\n1782: "..., 4096) = -1 EAGAIN (Re +source temporarily unavailable)

    The only plausible explanation I can think of is OpenSSH ssh setting STDOUT into non-blocking mode. BTW, how are you capturing STDOUT?

    Add a dump of /proc/$$/fdinfo:

    open OUT, ">", "./trace.out"; for (0, 1, 2) { local $/; open my $fdinfo, '<', "/proc/$$/fdinfo/$_"; my $info = <$fdinfo>; print OUT "fdinfo $_:\n$info\n\n"; } my $line =0; foreach (@cmdout) { $line++; my $bytes = print $line.": ".$_; print OUT "$line: bytes: $bytes, err: $! \n"; } close(OUT);

    The non-blocking flag is 0x4000. You may also like to generate dumps of fdinfo before the constructor call and before and after calling capture.

      Hi Salva,
      on a normal run:
      ============================== start fdinfo 0: pos: 0 flags: 0100002 fdinfo 1: pos: 0 flags: 01 fdinfo 2: pos: 0 flags: 01 ============================== after new fdinfo 0: pos: 0 flags: 0100002 fdinfo 1: pos: 0 flags: 01 fdinfo 2: pos: 0 flags: 01 ============================== after capture fdinfo 0: pos: 0 flags: 0100002 fdinfo 1: pos: 0 flags: 04001 fdinfo 2: pos: 0 flags: 04001


      for a failed run:
      ============================== start fdinfo 0: pos: 0 flags: 0100002 fdinfo 1: pos: 0 flags: 01 fdinfo 2: pos: 0 flags: 01 ============================== after new fdinfo 0: pos: 0 flags: 0100002 fdinfo 1: pos: 0 flags: 01 fdinfo 2: pos: 0 flags: 01 ============================== after capture fdinfo 0: pos: 0 flags: 0100002 fdinfo 1: pos: 0 flags: 04001 fdinfo 2: pos: 0 flags: 04001

      so the same....

      doesn't mean any thing to me... :-(

      I capture output using "tee" or even simpler only using "wc -l".
      For really using the script it would be called by another script and stdout would be read by caller.
        As I had supposed, after the capture call, the O_NONBLOCK flag is set.

        The ssh command that is being run under the hood by capture is leaving STDOUT in non-blocking mode. The issue seems fixed in newer versions of OpenSSH, or at least I am unable to reproduce it with the latest one (6.6.1p1).

        In any case it is easy to workaround. Just using capture2 instead of capture or setting stderr_discard => 1 should make it go.

        BTW, there isn't any difference on the fdinfo dumps between the successfully and failed invocations because the problem is hidden behind a race condition. The perl process needs to be faster writing than tee reading for the intermediate pipe buffer to fill.