in reply to Flaky Server (IO::Socket and IO::Select Question)

Ah, I think I understand now. Perhaps rather than IO::Select::can_read on your TCP handle, you should instead use IO::Select::select, and look for exceptions as well as readable handles (and also get the timeout). I'm not sure, but it seems like you might get an exception on a downed IP connection. And being able to query the state of all your handles on a timeout is nice, too.

Like you, I don't much like testing this occurrence using a failed sysread() call.

Interesting voting on my earlier post, BTW. Wonder if it was that I was wrong somehow or just that I offended the CGI orthodoxy here? No matter...

  • Comment on Re: Flaky Server (IO::Socket and IO::Select Question)

Replies are listed 'Best First'.
Re: Re: Flaky Server (IO::Socket and IO::Select Question)
by ginseng (Pilgrim) on Jun 13, 2001 at 07:03 UTC

    Ahah! printing perldoc IO::Select now...

    Researching IO::Select->select()

    Changing code...pretty major changes, getting an array of arrays, rather than a singular array in return...gotta handle both the array of handles ready to read, and the array of handles with problems...

    Testing code...

    ...fixing the stupid mistakes...

    ...forgot the "my" for this scope...

    ...it runs! but now it doesn't recognize errors on either link :(

    ...some more debugging...

    Okay. Kind of. The first thing I learned about IO::Select::select is that it has to be called differently. When I tried this: (code simplified slightly to show the pertinent parts)

    my $links = new IO::Select(); $links->add($mcserver); $links->add(\*STDIN); ... while (my @allhandles = $links->select($links, undef, $links) {
    the select call didn't block at all. Every time it hit, it returned an array of three arrays, all of which had no handles in them.

    Looking back at the perldoc IO::Select page, it says "Upon error an empty array is returned."

    TIP #1: The returned array is not empty. The arrays contained in the returned array are empty.

    So now I knew (rather, presumed) there was an error, so back to the docs I went. "'select' is a static method, that is you call it with the package name like 'new'." I finally figured out what that meant...my:

    while (my @allhandles = $links->select($links, undef, $links) {
    should have been
    while (my @allhandles = IO::Select::select($links, undef, $links) {
    and after I changed that, it blocked properly.

    TIP #2: 'Static' methods are called by package name, not by instance. (You probably knew that. I learned the hard way.) ;)

    The other thing I changed was how I handled a disappearing client. I figured the error array would report an EOF, telling me the client is no longer present. (It was a nice try...) I did have a sysread like this:

    if (sysread $handle, $command, 1024) { ... do stuff } else { ... client went away...handle it. }
    I found it is still necessary :)

    TIP #3: A closing socket (at least from telnet) is not an error, as far as IO::Select::select is concerned. (Probably a very reasonable thing.)

    So now I've got the basics covered (i.e. I'm back to where I started with can_read()), and I've just yanked the ethernet cable to simulate a flaky server. Minutes pass... I take a smoke break. I have the client do things I know will generate traffic to the (disconnected) server. Still, I get no valid errors :(

    Bummer.

    TIP #4: Just because you learned a lot doesn't mean your code is right...

    Maybe there are flags I should be setting? Maybe I need to build a programatic shrine to St. Larry in my code?

    ginseng

      Yes, as I hinted elsewhere, it can takes several minutes for TCP to complain in the slightest in the face of a machine that is completely unresponsive. Things like "ICMP host unreachable" (in the case of a smart router) and "connection reset" (in the case of a rebooting server) can hurry this along.

      So you need to put your own maximum silence time into your code based on what makes sense for your situation. Usually this involves coming up with some harmless "heartbeat" packets that can be exchanged. In the face of an existing protocol, you hope to find some nondestructive "get status" request that you can send if there has been no other reason to talk to the server in the past N seconds. Then you can reset the connection whenever you have not gotten anything from the server for 2*N seconds.

      BTW, the reason that TCP takes so long to notice a dead connection is that the protocol, by default, assumes that it can take up to 2 minutes for a packet to traverse the network. This means 4 minutes round trip and about 8 minutes to retry packets enough times that you decide to give up.

      In many (most) modern uses of TCP (at least those that don't involve dial-up users, non-terrestrial spacecraft, or carrier pidgeons), this 2-minute max time is something like an order of magnitude longer than probably makes sense. You may check if your TCP stack supports configuring this value down to something more reasonable (but beware of changing this casually!).

              - tye (but my friends call me "Tye")