Intro

Hello all. Im having a rather peculiar problem and while i know how to fix it (ie, i changed the code and it fixed the problem) i dont understand the WHY it was behaving poorly in the first place. The facts:
  1. Apache 1.3.41
  2. Mod Perl 1.31
  3. Perl 5.10.1
  4. OS Cent OS 5.2 AND RHEL 5.8 (same result on both)
Using mod perl i have some code that makes a TCP connection to a server:
eval { local $SIG{ALRM} = sub { die 'timed out' }; alarm($connect_timeout); $socket = IO::Socket::INET->new( PeerAddr => $hostname, PeerPort => $port, Proto => 'tcp'); alarm(0); };
I understand that the usage of alarms like this is normal AND i believe at the time of this writing that IO::Socket::INET did not properly institute their Timeout argument, thus the original author used alarm.

The Problem

Basically i have a apache server running with 120 processes (max client directive). We have a healthcheck script on the server that tests for several things that are application dependant to verify that the server is in generally good state. Two of those checks are over the network. Over time looking at server-status and ganglia the server would continually see a rise in the number of processes that would be stuck in "W" or "sending reply" state. In doing a strace on the process it would reveal a forever waiting entry: futex(0x46fa94, FUTEX_WAIT, 2, NULL. The server would eventually, obviously, just stop taking all requests requiring a restart. What we determined was that in one request there was a timeout to the wanted host and on the subsequent reuse of the process the process would hang forever in the wait state to get to the point to create another socket out.

The Fix

Relatively simple...now that IO::Socket::INET uses the Timeout properly i switched the code from using an external alarm to using the argument:
eval { local $SIG{ALRM} = sub { die 'timed out' }; # on any subsequent network/socket usage (if thread pr +eviously timeout) $socket = IO::Socket::INET->new( PeerAddr => $hostname, PeerPort => $port, Timeout => $connect_timeout, Proto => 'tcp'); };
Note that the text 'timed out' is used later in the script to do some logic. This not only fixed the issue of the processes being fubar'ed from creating a future socket after a timeout, the timeouts just disappeared. In fact, i had to change what host it wanted to use to something that would drop packets and force a connection timed out.

The Actual Question

What in the world is happening in regards to the old way that would cause this to happen to the processes? Google searches did me no good and while i have a fix in place id love to understand the why. Thanks, let me know if i missed something that may be useful

In reply to Apache Processes Hung on Socket Issue by eallenvii

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.