in reply to Flaky Server (IO::Socket and IO::Select Question)

Well, yanking your network cable could certainly cause TCP to take 8 minutes to decide that the link is down. But if select has said that you can read from the socket, then sysread shouldn't hang, even if you've yanked the network cable. I'm tempted to call that a bug in your TCP/IP stack at this point (though that seems unlikely with BSD).

The "Timeout" parameter is probably only going to affect IO::Socket methods so sysread probably won't honor it. However, looking for a "read" operation in IO::Socket and IO::Socket::INET, I was surprised to only find recv which didn't appear to honor the timeout either. Now IO::Socket does some tricky things so the timeout might be handled but in a way that wasn't obvious. So you could try using $mcserver->recv(...) instead.

Some timeouts in Socket are (or at least used to be) handled via alarm, so you could go that route. Though this could eventually cause corruption in Perl's internal state which would eventually kill a long-running process. (Though "safe Perl signals" will likely appear in the next major release of Perl, perhaps sooner.)

In the face of sysread blocking after select said it wouldn't, I might resort to having a watchdog process. You could avoind having the watchdog depend on select by having the main process, for example, append one byte to an open file each time it reads or write a packet (and truncate the file every M bytes). Then the watchdog could check the mtime or size of the file every N seconds and, if it doesn't change, kill the main process, start a new one, etc.

        - tye (but my friends call me "Tye")

Replies are listed 'Best First'.
Re: (tye)Re: Flaky Server (IO::Socket and IO::Select Question)
by ginseng (Pilgrim) on Jun 13, 2001 at 02:02 UTC

    tye/Tye, thanks

    My original post may have been unclear. I said "hang" but I didn't necessarily mean that the program hung, only that it kept on trucking despite the fact that the motion controller is no longer there.

    To restate my problem, this code is a middle-man. On opposite sides of it's table are the client (the operator interface) and the server (the motion controller.) If the client goes away, the middle man will kick the server out too. If the server goes away, the middle man will try to bring it back.

    In actuality, the server is going away (every couple of days on its own; every time I yank the network cable in test) and the middle man just keeps on listening to an empty chair. I want it to get off it's butt and get the server back to the table.

    Does that make sense? I keep feeling like there should be a good way of doing this, and fearing there is not...

    ginseng