williams has asked for the wisdom of the Perl Monks concerning the following question:

Somehow, it appears I'm getting an ECHILD from select(), which makes no sense to me. I'm writing a server that forks child processes to handle requests. They then exit. The server assigns a signal handler to SIGCHLD to reap dead children, per perlipc.
$SIG{CHLD}=\&reaper; ... sub reaper { 1 while waitpid -1,&POSIX::WNOHANG != -1; $SIG{CHLD}=\&reaper; }
If anywhere, this is where an ECHILD would make more sense. Instead, it shows up in the server's main loop, which is the following.
use Errno; ... $select=new IO::Select(); $select->add(...); ... while (1) { $!=undef; while (@ready=$select->can_read()) { handle($_) foreach @ready; } die if $select->count() == 0; #should always have at least one last unless $! == EINTR; #try again; interrupted by signal } die if $! == ECHILD;
The weird part is that the die at the end actually executes. How can this be? I thought ECHILD occurred only in wait() and waitpid(). I understand that system, piped opens, etc. do their own waiting and can generate ECHILD, but how can that error code ever show up where I'm seeing it?

For further background, I'm on RedHat Linux 4.5, running Perl 5.8.5. All sockets involved are IO::Sockets set to block.

This is just weird.

Jim

Replies are listed 'Best First'.
Re: Inexplicable ECHILD ("global")
by tye (Sage) on Aug 11, 2010 at 00:28 UTC
    If anywhere, this is where an ECHILD would make more sense.

    Such a statement only makes sense if $! is scoped to be different between your signal handler and your main loop. They are both in the same process. $! is global to the process. And signals handlers get run asynchronously compared to the main program flow.

    Add local($!); to your signal handler.

    - tye        

      That fixes it. I also localized $? for the same reason. Ignoring SIGCHLD fixes it too, but I have code to run in the reaper.

      Thanks,

      Jim

Re: Inexplicable ECHILD
by ikegami (Patriarch) on Aug 11, 2010 at 02:24 UTC

    In a bit more detail than above:

    The signal comes in, interrupting select inside of can_read. Between that op and the next, the signal handler is called. It calls waitpid until it returns an error, setting $! to ECHLD. The signal handler returns, and nothing changes $! between there and when you check it.

    By the way, there was talk of having certain variables automatically localised by the signal handler, including $!. I don't know if that's already in 5.12 or 5.14.

Re: Inexplicable ECHILD
by JavaFan (Canon) on Aug 11, 2010 at 09:21 UTC
    The weird part is that the die at the end actually executes. How can this be?
    Because you're using $! when it isn't meaningful. $! is only meaningful right after a failure in a system call. Anything else may set $! to whatever it wants. Only use $! right after a failing system call.
      That's not true. Even if he did it properly (say by adding next if @ready; before the first die), he'd see the same behaviour. waitpid unavoidably overwrites $!=EINTR with $!=ECHLD between the time select reports an error and the program checks the cause of the error.
      ... use Errno qw( EINTR ); sub reaper { local ($!,$^E,$@); 1 while waitpid -1,&POSIX::WNOHANG != -1; $SIG{CHLD} = \&reaper; } $SIG{CHLD} = \&reaper; ... while (1) { my @ready = $select->can_read(); if (!@ready) { next if $! == EINTR; die "select: $!"; } handle($_) foreach @ready; }

      Update: Added code.

      No. The OP is checking $! very soon (in time) after a failing system call (select inside can_read() from IO::Select). The signal handler adds a race condition that has little to do with checking $! at the wrong time.

      Granted, there is a small chance that $select->count() might (indirectly) trigger the setting of $! or that can_read() could return an empty list w/o select having failed, but the source of the former risk is actually what avoids the latter risk (and the former risk seems slight).

      The code could certainly be clearer on what was expected to have set $!. Saving off with something like $err= $!; immediately after the failure would add clarity. But it wouldn't eliminate the race condition and so wouldn't really fix the actual problem described.

      - tye