in reply to Re: Re: Flaky Server (IO::Socket and IO::Select Question)
in thread Flaky Server (IO::Socket and IO::Select Question)
Yes, as I hinted elsewhere, it can takes several minutes for TCP to complain in the slightest in the face of a machine that is completely unresponsive. Things like "ICMP host unreachable" (in the case of a smart router) and "connection reset" (in the case of a rebooting server) can hurry this along.
So you need to put your own maximum silence time into your code based on what makes sense for your situation. Usually this involves coming up with some harmless "heartbeat" packets that can be exchanged. In the face of an existing protocol, you hope to find some nondestructive "get status" request that you can send if there has been no other reason to talk to the server in the past N seconds. Then you can reset the connection whenever you have not gotten anything from the server for 2*N seconds.
BTW, the reason that TCP takes so long to notice a dead connection is that the protocol, by default, assumes that it can take up to 2 minutes for a packet to traverse the network. This means 4 minutes round trip and about 8 minutes to retry packets enough times that you decide to give up.
In many (most) modern uses of TCP (at least those that don't involve dial-up users, non-terrestrial spacecraft, or carrier pidgeons), this 2-minute max time is something like an order of magnitude longer than probably makes sense. You may check if your TCP stack supports configuring this value down to something more reasonable (but beware of changing this casually!).
- tye (but my friends call me "Tye")
|
|---|