in reply to Socket connect curiosity

Hmm... that is a good question. It is surely due to the timing of events as the target system goes through its startup.

When the operating system first boots up, it takes a while to load the NIC driver. Before that point it won't respond to the initial SYN packet sent by connect() at all. Once the NIC card is initialized, the operating system kernel will start sending either RST packets as a response to clear the connection or ICMP port unreachable to tell you that nothing is listening. This can be modified by iptables, which should start before the NIC card is initialized, but could start afterwards if your system startup was done by someone with odd tastes. Iptables can be set to ignore connections to ports that are not open without sending any response at all.

Once sshd starts, it gets to decide whether to accept incoming connections or to reset them, and it may (I'm not sure) reset them until it finishes its initialization. Also, if sshd is set to log incoming connections, there can be a delay of several seconds the first time it tries to resolve the client's IP address to a DNS name (to put into the log entry) and doesn't get a response from the DNS server.

If you think it is worth some effort to track down, say if it may related to an actual problem you are seeing, you might want to get a packet sniffer and see what packets are actually being exchanged. Normally I would just suggest running wireshark but it won't work properly in this case... it doesn't like to be started up on a NIC card that isn't initialized. The simplest thing would probably be to connect a separate packet sniffer, which could just be a separate machine running wireshark, connected in parallel with the SSH server box using a hub or the port-spanning function on a switch.

Replies are listed 'Best First'.
Re^2: Socket connect curiosity
by monarch (Priest) on Dec 01, 2007 at 11:51 UTC
    This is one of those areas where you can help yourself by isolating where the problem is. Try and think of all the potential places in your code where delays could occur. The sleep() function is one, definately - and maybe you suspect your sleep() function is broken!

    How shall we determine if sleep() is sleeping for a lot more than your prescribed 3 seconds? Let's try some debugging messages! e.g.

    else { $attempts++; print("Trying again ... ($attempts) \n"); print( scalar localtime() . "About to sleep\n" ); sleep(3); #sleep 3 seconds before trying again print( scalar localtime() . "Finished sleep\n" ); }

    I would bet London to a brick you'll discover the sleep function is just fine. So what else could cause delays in your loop? That connection() function gets called every time so how about we put some debugging messages around that? e.g.

    print( scalar localtime() . "About to connect\n" ); $connected = connect(SOCK, $paddr); print( scalar localtime() . "Finished connect\n" );

    To make your life a bit easier you might want to autoflush messages so that they appear as soon as you print them. Stick the following at the top of your script:

    select( ( select(STDOUT), $| = 1 )[0] ); select( ( select(STDERR), $| = 1 )[0] );

    Next you'll probably ask how you can implement a time-out mechanism on your connect. That's a two part problem: controlling how long the DNS look up takes (when DNS breaks resolving addresses can take a long time) and controlling how long the connect will wait. For now I won't advise how to deal with this problem; I hope I've given you some insight as to how to determine what the problem you're actually facing is.