in reply to Failures in TCP/IP stack

Have you checked for memory growth over time?

Try running the main bulk of your code, the checks, in a standalone script that doesn't use preforking, see whether that changes things.

If you can just loop calling the checks as fast as your network admin will let you get away with, on a test network if such is available, or after hours, weekends, whatever makes sense in your environment. Perhaps only using a single machine as the target of your tests, maybe your own. The idea being to run the components as fast as you can to artificially excaserbate the problem. If you can get the problem to occur in a reasonably short period of time, then it becomes much easier to exclude the various subcomponents one at a time and try and isolate the fault that way.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^2: Failures in TCP/IP stack
by hubb0r (Pilgrim) on May 29, 2006 at 02:14 UTC
    I'm thinking that there is actually a bug in either the DBI or in DBD::mysql that is causing the file descriptors to hang open on failed connections. Anyone have any ideas, or should I start looking through the DBI code to track it down?

      First off, try running (a copy of) your code with the DBI stuff commented out. If your problem goes away, you know that's where to look.

      Then, produce a cut down verion of your deamon script that only contains the DBI checks against a single, test installation of MySQL, and try to reproduce the problem by setting the frequency as high as you can.

      At that point you should have a much smaller script that reproduces the problem much more quickly. If you then post that here, the DBI experts here (not me!), will be much more likely to take the time to review your shorter code and perhaps spot the problem or offer suggestions as to a way forward.

      I have a vague recollection that by default (back at version 3.something), MySQL hung on to unclosed connections for something like 900 seconds? And that there was a configuration option (at the server end) to have connections timeout more quickly. My recollection may be wrong, and it probably wouldn't produce the symptoms you are seeing, but it the kind of thing that those with good MySQL experience may spot for you, once you have isolated the problem and posted a concise script that reproduces it.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        My conclusion that it was a DBI issue was based on commenting out all checks but DBI. I counted the orphaned file descriptors per pass, and they corresponded with the number of servers which were not in fact running mysql at the time.

        You are right though... I'll make a test case which illustrates that issue, and present that as a new question here.