in reply to ssh output is partial when using fork manager

Can you help me understand how Fork Manager works? I have about 180 servers. And the command output for each of those servers takes a few seconds (lets say 20) sec on my screen to complete. If I am not mistaken, the script will try to immediate "parallel" execution but it is not really parallel. So we ssh to 180 nodes (that is fast) and then we start sending the ssh command. This starts the 240s output. However, the processing has to jump from one child to the other and check till we get the character that indicates the end of output and stops timer. Does the ssh timer stop in between checking? I mean it might take 20 sec to reach the end of the output but much more till the processing returns to a specific child. To give you some extra info, I have been printing the time when the child finish and it is more than 240s since the start and most children finish at the same time. I will go ahead and experiment with increasing the timer in the mean time. Thanks for your feedback so far.
  • Comment on Re: ssh output is partial when using fork manager

Replies are listed 'Best First'.
Re^2: ssh output is partial when using fork manager
by QM (Parson) on Jan 25, 2018 at 11:38 UTC
    With that many parallel processes all trying for network access, I have encountered some limitation. I think it's not in the host memory or number of processes, but somewhere deeper in the network drivers on the host. You can get a similar result by, for instance, trying to ping multiple hosts in parallel -- above a certain number of hosts, the network response goes horribly sluggish.

    For ethernet, complete congestion results in many retries, with each retry picking a random wait time from an ever larger window (see [no such wiki, Exponential_backoff]). So 200 parallel processes would have many ethernet collisions, and some small fraction would end up with the maximum backoff time. At some point normal ssh connections timeout due to lack of activity, and drop.

    I had exactly this problem with a little script I wrote years ago, before I knew about Parallel::ForkManager and the like. At the time it didn't matter that I didn't get all of the responses, and it wasn't for any automated system, just my own whims on finding a remote host with certain conditions. (See the doc page for how to limit the number of parallel processes.)

    -QM
    --
    Quantum Mechanics: The dreams stuff is made of

Re^2: ssh output is partial when using fork manager
by Anonymous Monk on Jan 25, 2018 at 17:03 UTC
    Hello all, Thanks for your feedback. I tried to minimize the number of processes running at the same time by introducing a delay in the loop before spawning a new process. This way the total number of parallel processes running would be less, since some would have finished before others start. It made my script a bit slower but I had zero failures.
      You can also go for Net::OpenSSH::Parallel which knows how to handle most of the issues you are facing by itself.

        A code example of Some people find easier to use Net::OpenSSH combined with Parallel::ForkManager, threads or Coro. would be handy. hint hint ;)

        Jason L. Froebe

        Tech Blog

      But that is precisely what Parallel::ForkManager is for!

      You tell it how many processes to run concurrently when you create the object and then it takes care of never running more than so many processes, delaying the start calls as necessary.

Re^2: ssh output is partial when using fork manager
by Anonymous Monk on Jan 24, 2018 at 19:13 UTC
    OK, So I have changed the timer from 240 to 340. What happened is that script is successful in many more nodes. However now I get a lot of errors SSHProcessError The ssh process was terminated. at diameter_Status_Script.pl line 123. That line is: $ssh->waitfor("#", 240);
      Have you tried reducing the number of parallel processes?

      Run top or your favorite OS monitoring tool to see what is going on in your system.