Socket descriptor passed across fork -- hanging

dd-b has asked for the wisdom of the Perl Monks concerning the following question:

ETA: see bottom, I've reproduced it in simple code, and know roughly what's wrong, but have questions about how to fix it still.

I thought I understood the basics of a forking server; I've certainly seen dozens of code samples. But the one I've written is hanging on the first $socket->read() call in the child (where that socket is the one created in the parent by the connect() call on the listening socket).

I'm running Perl 5.8.8 on Linux, so I should have autoflush set by default (setting it manually hasn't helped, anyway). I'm using IO::Socket::INET.

What I'm seeing exactly is that the child hangs on the read call until it gets a SIGINT (sent as part of the shutdown when I kill the test program), at which point it the read call returns NOT with an error, but with the expected data.

The one thing I've noticed I'm doing differently from the sample servers is that mine reads one line in the parent, before forking the child. (It's getting information to do load balancing, based on the request in the first line of the request; by doing that in the parent, I keep the load information conveniently in one place without needing to do ipc from child to parent. This is for a local protocol and it needs to spread things that used to go to one server across many based on the requests in the first line, so an ordinary load-balancer package won't do what I need. And the call rate will be low, dozens a second rather than thousands.)

I'm wondering if there's magic I have to do to avoid conflicts between the parent and the child on that socket. Should I undef it on the parent after forking the child? I'm pretty sure closing it is wrong (tried it).

I'm also wondering if using $socket->getline() once followed by a string of $socket->read() is causing my problem. The doc seems to indicate they're compatible, at least by omission (it makes a point of saying sysread is NOT compatible with read/readline).

Any thoughts? All the online stuff I've found is either showing very simple servers, or else talking about abstruse details in a particular case.

The logs below show a client connection being accepted, the first line of the request read, routing decision made based on that data, child forked, server connection opened, and two lines sent to the server connection. Then, at 2011-10-12 13:16:35,254 it enters the $socket->read() call, and hangs; until I kick it (by shutting down the test program which sends signals to the proxy, as well as the test client and server). So, at 13:16:52,531 the data that was sent much earlier is read.

2011-10-12 13:16:34,972 DEBUG kcmdproxy(22168):509 Accepted connection
+ 1 from 192.168.1.23:52674
2011-10-12 13:16:35,002 DEBUG kcmdproxy.io(22168):388 bufrdline: <<spe
+cial,endofworld<lf>>>
2011-10-12 13:16:35,028 INFO  kcmdproxy(22168):538 Routed "special,end
+ofworld" to localhost.localdomain(127.0.0.1):2000
2011-10-12 13:16:35,034 DEBUG kcmdproxy.child(22171):564 In child
2011-10-12 13:16:35,051 DEBUG kcmdproxy.io(22171):435 Output: <<PROXY,
+192.168.1.23<lf>>>
2011-10-12 13:16:35,036 INFO  kcmdproxy(22168):652 Forked child 22171
2011-10-12 13:16:35,052 DEBUG kcmdproxy.io(22171):435 Output: <<specia
+l,endofworld<lf>>>
2011-10-12 13:16:35,130 DEBUG kcmdproxy.child(22171):578 Copying reque
+st data from client to server
2011-10-12 13:16:35,170 DEBUG kcmdproxy.io(22171):404 sockcopy start k
+eep 0
2011-10-12 13:16:35,214 DEBUG kcmdproxy.io(22171):338 bufrd
2011-10-12 13:16:35,254 DEBUG kcmdproxy.io(22171):344 bufrd a
2011-10-12 13:16:52,531 INFO  kcmdproxy(22168):496 Accept returned EIN
+TR
2011-10-12 13:16:52,563 DEBUG kcmdproxy.io(22171):349 bufrd res <<8>>:
+ <<END,END<lf>>> err 
2011-10-12 13:16:52,564 DEBUG kcmdproxy.io(22171):435 Output: <<END,EN
+D<lf>>>
[download]

Any ideas? Going quite batty here, and we needed this done last week.

I recreated the problem in a simple server. This server, when talked to by nc, reproduces my problem exactly (yay!):

#! /usr/bin/perl

use strict;
use warnings;
use IO::Socket;
use POSIX qw ( :sys_wait_h :fcntl_h );
use Errno qw ( EINTR EAGAIN );

my $testport = 8080;


# This is what runs in the child process
sub kidstuff {
    my $sock = shift;
    # Read and log whatever comes in
  BUFRD:
    while (1) {
    $! = 0;
    my $data;
    my $res = $sock->read($data, 99);
    # res: #chars, or 0 for EOF, or undef for error
    die "read failed on $!" unless defined($res);
    last BUFRD if $res == 0; # EOF
    print "Read($res): $data\n";
    }
}

$|=1;                # autoflush
my $listener = IO::Socket::INET->new (
                      LocalPort => $testport,
                      type => SOCK_STREAM,
                      Proto => 'tcp',
                      Reuse => 1,
                      Listen => 5,
                      );
if (!defined($listener)) {
    die "Failed to listen on port $testport: $!";
}

CLIENT: while (1) {
    my $client = $listener->accept();
    if (!defined($client)) {
    # Some kind of error
    if ($! == EINTR) {
        print "Accept returned EINTR\n";
        next CLIENT;
    }
    # If it's an undef other than EINTR, maybe not really a client,
    die "Accept error: $!";
    }

    # Read first line from client
    my $l1 = $client->getline();
    die "client read error $!" unless defined($l1);
    print "Server, first client line is $l1\n";

    # Now fork server
    my $kid = fork();
    die "Fork failed" unless defined($kid);
    if ($kid == 0) {
    print "Child $$ running\n";
    kidstuff($client);
    print "Child $$ complete, exiting\n";
    exit 0;
    }
    # Parent continues here.
    while ((my $k = waitpid(-1, WNOHANG)) > 0) {
    # $k is kid pid, or -1 if no such, or 0 if some running none dead
    my $stat = $?;
    print "Reaped $k stat $stat\n";
    }
}                # CLIENT: while (1)
[download]

The problem is that $socket->read($buf,$size) isn't returning until EOF or $size bytes are seen. I thought of that days ago, and thought I had run a test to eliminate it in my complicated code; but in this simple code it's clearcut (suppose I should have gone for the simple case earlier?).

So, what's the solution? Non-blocking IO? As in $sock->Blocking(0)? What I want is a call that blocks until there is data to return, and then returns (but doesn't care about amount). I could do 1-character reads, and assemble the lines myself, but that seems really stupid-ass. If I set non-blocking IO, how do I distinguish EOF from no data? And how do I avoid spinning and wasting CPU resources? Do I have to resort to select() for something this simple?

(Line-oriented IO is where I started. That turns out not to work because some of the existing clients omit the line terminator at the end of the session, and a line-read hangs waiting for them. And the whole point of this server is to let us redistribute the load, and reshuffle which functions are in which server, and how many copies are run of each server, without having to change clients and go through the whole deployment process again.)

Comment on Socket descriptor passed across fork -- hanging Select or Download Code

Replies are listed 'Best First'.
Re: Socket descriptor passed across fork -- hanging by Marshall (Canon) on Oct 13, 2011 at 07:27 UTC
I looked back at some Perl code that I wrote to emulate one of my fork based C servers. This server used fixed length packets (256 bytes) but I hope that this will be useful for you to extend to the idea of an arbitrary length "\r\n" terminated packet. First the child's read and write subs, then some comments. $SIG{PIPE} = sub {close $active; exit (3)}; sub readn { my ($socket, $bytes ) = @_; my $offset = 0; my $buf = ""; while ($offset < $bytes) { my $nread; my $nleft = $bytes-$offset; $nread = sysread($socket,$buf,$nleft,$offset); kill 'PIPE',$$ unless (defined $nread); ## undef is like -1 uni +x return last if ($nread ==0); ## EOF $offset += $nread; } return ($buf,$offset); } sub writen { my ($socket, $buf) = @_; my $bytes = length $buf; my $offset =0; while ($offset < $bytes) { my $nwritten; my $nleft = $bytes-$offset; $nwritten = syswrite($socket,$buf,$nleft,$offset); kill 'PIPE',$$ unless (defined $nwritten); # undef is like -1 +unix return $offset += $nwritten; } return $offset; } [download] First off, there are some imperfections in the translation from the C way to the Perl way! Probably returning \$buf instead of $buf would be better, etc. But this code covers the three cases of a)EOF b)end of packet c)SIGPIPE (client died). When the client sends 256 bytes, often the readn() will see 2 128 byte receptions even if both client and server are on the same machine. In practice, you will see "hunks of data" even if client sends "huge thing" or "byte at a time". A couple of "trips" through sysread is no big deal. Note when I see undef from sysread(), I send SIGPIPE to myself! This is so "client went away" is handled consistently and "by normal method". There is no need to read one byte at a time even if you want to read "lines". Just check for network line termination at the end of the received bytes to see if you need to continue to loop for more bytes. Note: network line termination is: carriage return, line feed NOT \n. In Perl, if you write a string to a socket, Perl will send the right thing. But if you are receiving bytes or sending raw bytes, it is up to you to do the "right thing". The above is not "finished code" for your specific app, but I hope it gives you a starting place. Your line oriented code that "does not hang" will be similar and about the same length. Update: I found a complete client/server application pair written in a)Perl and b)C for you. This is a demo forking server. The client is able to query "who" is on the server. What the client/server does is not that impressive. How it does it and the use of the normal signals is much more to the point. Perl Server: Read more... (5 kB) Perl Client: Read more... (5 kB) C Client/Server: The C code of the above is longer, much longer. Since this is a Perl forum, I am not sure that it would add value, but if requested, I will do it.	[reply] [d/l] [select]
Re^2: Socket descriptor passed across fork -- hanging by dd-b (Pilgrim) on Oct 13, 2011 at 13:50 UTC
Yeah, I know the line terminators are "wrong" in my server; it's for an existing local protocol with existing clients and servers, so I can't mess with that. So "\n" is "right" for this case, though non-standard. I see you're going down to the sysread level. That's probably the cleanest solution. I started out using line-mode reads, which was working fine until we uncovered the clients that didn't terminate their last line. Since the value proposition for this server is increasing our flexibility in deploying multiple copies of servers, bundling commands into servers dynamically, and so forth, without having to change and redeploy all the clients, that was kind of a quandry, so I've had to give up line-oriented IO. But because you're using fixed-length packets, you avoid the actual problem that trapped me. You say in the code that the "next" if accept returns undef isn't necessary any more, but I've seen my $listen->accept() call return EINTR (perl 5.8.8, Linux 2.6.18), so it seems to be necessary still here. I really appreciate a more sophisticated running example. I've been wondering about things like closing the handles not used after the fork, for example; you seem to think that's worth bothering with.	[reply]
Re^3: Socket descriptor passed across fork -- hanging by Marshall (Canon) on Oct 13, 2011 at 14:28 UTC
I would like for you to explain what this means: until we uncovered the clients that didn't terminate their last line What? They closed the socket before sending \r\n? My code deals with that. because you're using fixed-length packets, you avoid the actual problem that trapped me No, not at all. I thought that I explained clearly how to deal with an indeterminate length \r\n terminated packet. What was not clear? Obviously something was not. It would help if you could ask the question in a different way. Please look at: sysread. sysread() will return with a number of bytes read. If the other side sends: "1234\r\n", just look at the end of the buffer to see if there is a line termination (last 2 bytes). What's the problem? I think that you can easily adapt my code to deal with your requirements.	[reply]
Re^4: Socket descriptor passed across fork -- hanging by dd-b (Pilgrim) on Oct 13, 2011 at 17:47 UTC
Re^5: Socket descriptor passed across fork -- hanging by Marshall (Canon) on Oct 15, 2011 at 10:28 UTC
Re^5: Socket descriptor passed across fork -- hanging by Marshall (Canon) on Oct 15, 2011 at 17:26 UTC
Re: Socket descriptor passed across fork -- hanging by zentara (Cardinal) on Oct 13, 2011 at 12:32 UTC
Any thoughts? My mind's been a bit muddled lately ;-), and I only have grasped the tip of your iceberg, but it is reminiscent of using FIONREAD. Run perldoc -q filehandle and search for the section under How can I tell whether there's a character waiting on a filehandle? It explains how to use FIONREAD. You can also look at my example of it's use in IPC3 buffer limit problem. I hope it's useful to you. I'm not really a human, but I play one on earth. Old Perl Programmer Haiku ................... flash japh	[reply]
Re: Socket descriptor passed across fork -- hanging by dd-b (Pilgrim) on Oct 12, 2011 at 20:53 UTC
Okay, non-blocking IO gives me an error, errno == EAGAIN, when no data is available. So then I can loop and busy-wait on data becoming available? Ick! So I put in a usleep(10), and that dropped the CPU use from 100% to 0% while waiting for input, which seems like it's low enough to not bother about too much (dozens of clients, remember; if it were thousands I might have to work harder). So this works in my test code, and it seems to work in my real code as well (test cases running as I type). The other choice is select(), but select requires sysread rather than ordinary read, meaning I'd have to do my own buffering implementation for the cases where I have to read line-oriented data. That might be the solution if I needed to support thousands of clients, though, it'd probably waste less cpu horsepower than my usleep(10) does. This should be core perl territory, and I feel like I'm struggling quite a lot due to the environment not making it easy. Am I missing a better way? Or do I just have to do this over and over until it becomes simple and comfortable in my head? The documentation could have saved me days of annoyance by being a bit clearer about how read() behaves.	[reply]
Re: Socket descriptor passed across fork -- hanging by Marshall (Canon) on Oct 13, 2011 at 10:29 UTC
I'm wondering if there's magic I have to do to avoid conflicts between the parent and the child on that socket. Should I undef it on the parent after forking the child? I'm pretty sure closing it is wrong (tried it). See my code. After the fork(), the parent closes the active socket because the parent does not talk to anybody - it only listens for new connections. After the fork(), the child closes the passive socket because the child doesn't listen for new connections - it only talks on the active socket to the current client.	[reply]
Re^2: Socket descriptor passed across fork -- hanging by dd-b (Pilgrim) on Oct 13, 2011 at 15:23 UTC
I was using `$socket->close()`, rather than your `close $socket`. They should be equivalent, I thought, but I also thought I had trouble (closed on both sides) when I did the OO version. I've put in your version, and it hasn't broken anything (and will presumably be more stable in heavy use and under adverse conditions).	[reply] [d/l] [select]
Re^3: Socket descriptor passed across fork -- hanging by Marshall (Canon) on Oct 13, 2011 at 16:28 UTC
`close $socket` provides exactly the same functionality albeit faster than `$socket->close()` because a Perl object method call has more run-time overhead than a subroutine call.</c>	[reply] [d/l] [select]
Re: Socket descriptor passed across fork -- hanging by dd-b (Pilgrim) on Oct 13, 2011 at 15:31 UTC
So what does it mean when the following code: `$! = 0; $res = $sock->read($data, 255); # $res: #chars, or 0 for EOF, or undef for error $err = $!; $iologger->debug("bufrd res " . safe($res) . ": " . safe($data) . " err $err");` [download] Logs this: `2011-10-13 09:21:39,782 DEBUG kcmdproxy.child.io(30674):366 bufrd res +<<127>>: <<simkserver<sp>30665<sp>port<sp>2001<sp>client<sp>127.0.0.1 +<sp>command<sp>special<lf>1,8,7,2,13,1,12<lf>13,5,11,10,8,0,13<lf>5,2 +,11,4,5,10,5<lf>3,5,7,10,12,2,12<lf>>> err Resource temporarily unava +ilable` [download] The socket call returned 127, which should be a number of characters (and looks about right). And it also set $! (which I had carefully set to 0 before the call). There isn't threading going on, and the signal handling is at the level of setting one variable in the handler, so nothing else should be interrupting to set $!. I suppose I could localize it, just as a test, to see if this still happens, but I don't see how something else can be getting in to set it. Does the $! value mean anything in this case, or is it just garbage?	[reply] [d/l] [select]
Re^2: Socket descriptor passed across fork -- hanging by SuicideJunkie (Vicar) on Oct 13, 2011 at 15:38 UTC
`$!` only has meaning if the operation failed. After a success its value could be anything. perldoc says: If used numerically, yields the current value of the C errno variable, or in other words, if a system or library call fails, it sets this variable. This means that the value of $! is meaningful only immediately after a failure:	[reply] [d/l]