in reply to Socket connect curiosity

Thanks quester and monarch,

Running a sniffer I out of the question given the environment I operate in. I did try monarch's suggestions and here's the output:

--> Rebooting ... main::p_wait_alive --> Mon Dec 3 09:48:49 2007 Before nap main::p_wait_alive --> Mon Dec 3 09:48:52 2007 After nap main::p_wait_alive --> Connecting to port 22 ... main::p_wait_alive --> Mon Dec 3 09:48:52 2007 Before connect main::p_wait_alive --> Mon Dec 3 09:50:28 2007 After connect main::p_wait_alive --> Trying again ... (1) main::p_wait_alive --> Mon Dec 3 09:50:28 2007 Before sleep main::p_wait_alive --> Mon Dec 3 09:50:31 2007 After sleep main::p_wait_alive --> Mon Dec 3 09:50:31 2007 Before connect main::p_wait_alive --> Mon Dec 3 09:50:34 2007 After connect main::p_wait_alive --> Trying again ... (2) main::p_wait_alive --> Mon Dec 3 09:50:34 2007 Before sleep main::p_wait_alive --> Mon Dec 3 09:50:37 2007 After sleep main::p_wait_alive --> Mon Dec 3 09:50:37 2007 Before connect main::p_wait_alive --> Mon Dec 3 09:50:46 2007 After connect main::p_wait_alive --> SSH-1.99-OpenSSH_4.2 online main::reboot --> HostB is back online
A few explanations ... instead of print is have my own subroutine for debug messages. The behavior is controlled by a flag (print or do not print anything) and the structure of a debug message is module::subroutine (where is it printed from) and on the new line is the message prefixed by --> (hope this makes the blurb above somewhat more friendly).

Using non-buffered output is in my opinion the key element for the difference between what I had noticed before and what I witnessed just now. I knew about the output buffering problem in general in software development, I just didn't know how/if Perl was addressing it and how to handle that (I also must admit I had ignored it for this problem until you pointed it out for me - thanks). Live and learn :)

Anyway, going back to the problem, the nap period is a 3 second sleep I spend before going into my loop to make sure the other host has time to go down before I try to connect such that I won't "successfully" connect to a dieing server and declare the host to be online (sure, I could re-architect my loop to sleep first, try to connect later, but that's besides the point here).

It's clear that the first connect is taking forever to timeout and return (that's what I think happens after my nap, as I start looping), then it seems the other connect calls take 3 seconds to time out. It appears the sleep calls take the desired 3 seconds.

In conclusion, after turning off output buffering I iterate only about 3 times, the first connect takes a very long time and then each connect takes 3 seconds to time out and all sleep calls behave as expected.

OK while typing the paragraph above I also thought about running the script with the debug messages but with the buffering turned on. Here's the interesting output:

--> Rebooting ... main::p_wait_alive --> Mon Dec 3 10:09:52 2007 Before nap main::p_wait_alive --> Mon Dec 3 10:09:55 2007 After nap main::p_wait_alive --> Connecting to port 22 ... main::p_wait_alive --> Mon Dec 3 10:09:55 2007 Before connect main::p_wait_alive --> Mon Dec 3 10:10:43 2007 After connect main::p_wait_alive --> Trying again ... (1) main::p_wait_alive --> Mon Dec 3 10:10:43 2007 Before sleep main::p_wait_alive --> Mon Dec 3 10:10:46 2007 After sleep main::p_wait_alive --> Mon Dec 3 10:10:46 2007 Before connect main::p_wait_alive --> Mon Dec 3 10:10:49 2007 After connect main::p_wait_alive --> Trying again ... (2) main::p_wait_alive --> Mon Dec 3 10:10:49 2007 Before sleep main::p_wait_alive --> Mon Dec 3 10:10:52 2007 After sleep main::p_wait_alive --> Mon Dec 3 10:10:52 2007 Before connect main::p_wait_alive --> Mon Dec 3 10:10:55 2007 After connect main::p_wait_alive --> Trying again ... (3) main::p_wait_alive --> Mon Dec 3 10:10:55 2007 Before sleep main::p_wait_alive --> Mon Dec 3 10:10:58 2007 After sleep main::p_wait_alive --> Mon Dec 3 10:10:58 2007 Before connect main::p_wait_alive --> Mon Dec 3 10:11:01 2007 After connect main::p_wait_alive --> Trying again ... (4) main::p_wait_alive --> Mon Dec 3 10:11:01 2007 Before sleep main::p_wait_alive --> Mon Dec 3 10:11:04 2007 After sleep main::p_wait_alive --> Mon Dec 3 10:11:04 2007 Before connect main::p_wait_alive --> Mon Dec 3 10:11:07 2007 After connect main::p_wait_alive --> Trying again ... (5) main::p_wait_alive --> Mon Dec 3 10:11:07 2007 Before sleep main::p_wait_alive --> Mon Dec 3 10:11:10 2007 After sleep main::p_wait_alive --> Mon Dec 3 10:11:10 2007 Before connect main::p_wait_alive --> Mon Dec 3 10:11:13 2007 After connect main::p_wait_alive --> Trying again ... (6) main::p_wait_alive --> Mon Dec 3 10:11:13 2007 Before sleep main::p_wait_alive --> Mon Dec 3 10:11:16 2007 After sleep main::p_wait_alive --> Mon Dec 3 10:11:16 2007 Before connect main::p_wait_alive --> Mon Dec 3 10:11:19 2007 After connect main::p_wait_alive --> Trying again ... (7) main::p_wait_alive --> Mon Dec 3 10:11:19 2007 Before sleep main::p_wait_alive --> Mon Dec 3 10:11:22 2007 After sleep main::p_wait_alive --> Mon Dec 3 10:11:22 2007 Before connect main::p_wait_alive --> Mon Dec 3 10:11:25 2007 After connect main::p_wait_alive --> Trying again ... (8) main::p_wait_alive --> Mon Dec 3 10:11:25 2007 Before sleep main::p_wait_alive --> Mon Dec 3 10:11:28 2007 After sleep main::p_wait_alive --> Mon Dec 3 10:11:28 2007 Before connect main::p_wait_alive --> Mon Dec 3 10:11:31 2007 After connect main::p_wait_alive --> Trying again ... (9) main::p_wait_alive --> Mon Dec 3 10:11:31 2007 Before sleep main::p_wait_alive --> Mon Dec 3 10:11:34 2007 After sleep main::p_wait_alive --> Mon Dec 3 10:11:34 2007 Before connect main::p_wait_alive --> Mon Dec 3 10:11:37 2007 After connect main::p_wait_alive --> Trying again ... (10) main::p_wait_alive --> Mon Dec 3 10:11:37 2007 Before sleep main::p_wait_alive --> Mon Dec 3 10:11:40 2007 After sleep main::p_wait_alive --> Mon Dec 3 10:11:40 2007 Before connect main::p_wait_alive --> Mon Dec 3 10:11:43 2007 After connect main::p_wait_alive --> Trying again ... (11) main::p_wait_alive --> Mon Dec 3 10:11:43 2007 Before sleep main::p_wait_alive --> Mon Dec 3 10:11:46 2007 After sleep main::p_wait_alive --> Mon Dec 3 10:11:46 2007 Before connect main::p_wait_alive --> Mon Dec 3 10:11:49 2007 After connect main::p_wait_alive --> Trying again ... (12) main::p_wait_alive --> Mon Dec 3 10:11:49 2007 Before sleep main::p_wait_alive --> Mon Dec 3 10:11:52 2007 After sleep main::p_wait_alive --> Mon Dec 3 10:11:52 2007 Before connect main::p_wait_alive --> Mon Dec 3 10:11:54 2007 After connect main::p_wait_alive --> SSH-1.99-OpenSSH_4.2 online main::reboot --> HostB is back online
The only difference is the number of iterations. The first connect still takes a long time to time out, the rest is exactly the same, just more times...

Regards,

RaduH