mungohill has asked for the wisdom of the Perl Monks concerning the following question:

Monks,

This one is doing my head in. The program below is a reduced testcase of something bigger and even uglier, but it exhibits the same baffling behaviour as this one.

I posted a rather badly prepared question along the same lines about a week ago and, shortly after I found a workaround that I can't live with any more. And I'm back looking for a solution.

There is a call to a forking sub 'fork_one' that can be placed in one of two positions in the program. In either position, the 'fork_one' routine behaves exactly as expected. However, when the call is made in the second position, weird stuff happens later on.

In the first position, the 'fork_one' ends up running in an orphaned process, nothing to do with the main daemon. In the second position, it is a child of the main daemon.

Regardless of the positioning of the call to 'fork_one', when the accept receives a connection it calls the forking sub 'fork_three' which does its business and dies.

Depending on the position of the 'fork_one' routine, the accept subsequently either works perfectly normally or just blocks/hangs. The connection is made by the client, but no SSL negotiation happens.

If you don't exit in fork_three, but allow it to return to the main loop and the 'accept', it will correctly accept the next connection. I only say this to say that the socket isn't fundamentally broken, not because I think this is a rational thing to do.

By now I've pretty much forgotten what it was I was actually trying to communicate and in place of a real client, I've been hitting it with:

openssl s_client -cert certs/capsclscert.pem -key certs/capscls_pk.pem + -CAfile certs/cacert.pem -connect 10.250.3.242:6023
The test program is:
#!/usr/bin/perl use strict; use warnings; use IO::Socket::SSL; use POSIX ":sys_wait_h"; my $pidfilename = 'sftest'; # try monitor fork here: #fork_one(); # ... and everything is ok # become a daemon my $pid; if (!defined($pid = fork)) { die "Cannot daemonize myself: $! \n"; }elsif ($pid) { # I am the parent and the fork suceeded. Duck out now print "created daemon in $pid. I ($$) am ducking out\n"; exit; } # Try the monitor fork here: fork_one(); #... and the second call to accept hangs print "I'm daemonized on $$ \n"; sub REAPER { {} until ( waitpid(-1, WNOHANG) == -1) } $SIG{CHLD} =\&REAPER; my $sock; if(!($sock = IO::Socket::SSL->new( Listen => 5, LocalAddr => '10.250.3.242', LocalPort => '6023', Proto => 'tcp', SSL_verify_mode => 0x01, SSL_cert_file => 'certs/capssvcert. +pem', SSL_key_file => 'certs/capssv_pk.p +em', Reuse => 1, )) ) { die "unable to create socket: ", &IO::Socket::SSL::errstr, "\n"; } while (1 ) { my $s; print "$$ is listening...\n"; if ( $s = $sock->accept() ) { fork_three($s); } } #################### sub fork_one { my $pid; if (!defined($pid = fork)) { die "Cannot make monitor process: $! \n"; }elsif ($pid) { # I am the parent and the fork suceeded. Duck out now print "created monitor process in $pid. I ($$) am resuming \n +"; return; } print "I ($$) am going to loop around doing some fatuous monitoring +process\n"; while (1) { sleep 30; print "$$ is still monitoring\n"; } } sub fork_three { my $s = shift; my $pid; if (!defined($pid = fork)) { die "Cannot make dummy event processor: $! \n"; }elsif ($pid) { # I am the parent and the fork suceeded. Duck out now print "created dummy event processor in $pid. I ($$) am resum +ing \n"; return; } print "I ($$) am going to do some dummy event process\n"; $s->close(SSL_no_shutdown => 1); # my work is done... exit; }

Replies are listed 'Best First'.
Re: Socket Hangs Revisited
by shmem (Chancellor) on Jun 08, 2007 at 23:22 UTC
    That's because of your signal handler. Your parent gets stuck endlessly in the waitpid loop
    sub REAPER { {} until ( waitpid(-1, WNOHANG) == -1) }

    since waitpid returns the PID of the deceased process, or -1 if there are no more child processes. See waitpid.

    Your waitpid call will not return -1 since there's the monitor process hanging around. So waitpid will collect the PID of the child, but then loop forever. If you kill the monitor process, you'll see that your parent returns to its socket to listen. Change your handler to

    sub REAPER { 1 until ( waitpid(-1, WNOHANG) > 0) }

    and all should be fine. (1 is enough since constructing a hashref each time through the loop doesn't make much sense).

    --shmem

    _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                  /\_¯/(q    /
    ----------------------------  \__(m.====·.(_("always off the crowd"))."·
    ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
      Of course, you're right about this but you appear to have made a slightly wrong guess about the solution. I looked at the docco on waitpid (at the perl and c lib level) and couldn't find any specification of what waitpid returned if (a) there was a child process (b) it wasn't defunct and (c) you specified WNOHANG. Experimentally it turned out to be 0, which is kind of logical. So rather than checking against '>0', I'm checking against '== 0'
      I liked your everso discreet point about the empty hashref, lightly dismissing the possibility that I might have be crass enough to think I was specifying an empty block.
        I looked at the docco on waitpid (at the perl and c lib level) and couldn't find any specification of what waitpid returned if (a) there was a child process (b) it wasn't defunct and (c) you specified WNOHANG.

        Huh? from the waitpid section:

        waitpid PID,FLAGS
        Waits for a particular child process to terminate and returns the pid of the deceased process, or "-1" if there is no such child process. On some systems, a value of 0 indicates that there are processes still running. The status is returned in $?. If you say
        use POSIX ":sys_wait_h"; #... do { $kid = waitpid(-1, WNOHANG); } until $kid > 0;
        then you can do a non-blocking wait for all pending zombie processes.

        Emphasis mine. So, if you happen to be on a system where 0 indicates running processes, your test is wrong. It is also wrong to test for $kid == 0 as waitpid returns the pid of the deceased process. Again in your ordering:

        • (a) waitpid returns the PID of the deceased process
        • (b) waitpid returns 0 on some systems
        • (c) WNOHANG means non-blocking waitpid (it should return if there's no PID immediately reported (hence the loop))

        --shmem

        _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                      /\_¯/(q    /
        ----------------------------  \__(m.====·.(_("always off the crowd"))."·
        ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
Re: Socket Hangs Revisited
by mungohill (Acolyte) on Jun 09, 2007 at 01:00 UTC
    Thanks for that. I'm sure you're right. When I got the test case all nice and clean and documented like that and posted on the site, I said to myself 'it's going to be the reaper getting hung' but I couldn't fathom why that late on a Friday. Thanks again.