kscaldef has asked for the wisdom of the Perl Monks concerning the following question:

All the documentation I've encountered claims that $? should be non-negative (more specifically, an unsigned 16-bit integer). However, I have some code where $? is getting set to -1 and I have no idea what this means.

The code in question spawns a bunch of worker children, with a SIG{CHLD} handler to keep track of when they all finish, which looks like this (with some added debugging statements):

sub REAPER { while ((my $pid = waitpid(-1, &WNOHANG)) > 0) { if (WIFEXITED($?)) { } elsif (WIFSIGNALED($?)) { print "$?\n", $? >> 8, "\n", $? & 127, "\n"; print( "child exited with signal: ", WTERMSIG($?), "\n"); } elsif (WIFSTOPPED($?)) { print "child stopped????\n"; next; } else { print "hmmm\n"; } delete $children{$pid}; $children--; print "$children children running\n"; } $SIG{CHLD} = \&REAPER; #probably not needed, but paranoia }

Occasionally, this produces output like:

...
7 children running
6 children running
-1
72057594037927935
127
child exited with signal: 127
5 children running
...

I think that there might be some sort of race condition or reentrancy problem involved, as when I purposely space the children out so that their exits are well seperated, this never happens. However, the normal operation is for them to do very similar amounts of work, and therefore finish very close to one another. However, my understanding is that while the SIGCHLD handler is executing, any other SIGCHLD should be blocked (thus the while loop), so I wouldn't have thought this would be an issue.

So, I ask you, what is going on here? Is the documentation incorrect, am I screwing something up, or is this a bug in perl? I'm using 5.8.0, btw.

Replies are listed 'Best First'.
Re: $? is -1???
by davidj (Priest) on Jun 20, 2004 at 04:16 UTC
    with a SIG{CHLD} handler to keep track of when they all finish

    This is a direct quote from the perlvar manpage:

    If you have installed a signal handler for "SIGCHLD", the value of $? will usually be wrong outside that handler.

    If as you say, it should never be a negative number, this could possibly be the explanation,
    davidj
      Yeah, I read that, but the sub I posted is the SIGCHLD handler, so it doesn't seem that that applies.
Re: $? is -1???
by holo (Monk) on Jun 20, 2004 at 19:22 UTC

    From perldoc -f system:

    You can check all the failure possibilities by inspecting $? like this:

    if ($? == -1) { print "failed to execute: $!\n"; } elsif ($? & 127) { printf "child died with signal %d, %s coredump\ +n", ($? & 127), ($? & 128) ? 'with' : 'without +'; } else { printf "child exited with value %d\n", $? >> 8; }

    Update: In the case of waitpid, this $? == -1 means that the child has been reaped automatically. I encountered this some time ago in a similar snippet. I do not know exactly why but a similar effect can be obtained by:

    I cannot find the program I was working on but if I'm not mistaken, I got rid of all negative ones by simplifying the SIGCHILD handler. Store $? in the hash instead of testing it and test everything when all children are done.

      I'm actually starting the child processes with fork(), not system(), and I do check to make sure the fork was successful. If the child exited right away, I might agree that this could be the explanation, but the problem child lives for just as long as all the others. (The script prints out "1 child running" ... "20 children running" ... time passes while children do work ... "19 children running" "18 children running" ... "0 children running", then exits). There is every indication that the child process does all the work it's supposed to, and no indication that it actually exits abnormally in any way.
Re: $? is -1???
by dga (Hermit) on Jun 21, 2004 at 13:39 UTC

    waitpid returns -1 'if there is no such child process'. If you call it with none of the children still living, this seems like a reasonable expectation.

    Unless it has changed a lot in 5.8 the old advice was to 'Do as little as possible in your signal handler, like writing a status into an already defined and allocated variable' or 'Pretend like the signal stuff is reentrant but be aware that it may not always work that way.' I know the p5p people are working on solid fully reentrant signals but I do not know the progress toward that goal in 5.8.

    How about preparing a hash for the children when they are forked and then saving the exit status uninterpreted into that data space?

    while((my $pid = waitpid(-1, &WNOHANG)) > 0) { $children{$pid}=$?; # $children--; # pushing your luck # print $children children running\n"; # really pushing }

    I think having a print in a signal handler may be really too much time spent away waiting to be interupped by another signal. If you decremented the variable but did not print then maybe the parent could run a polling loop say once a second and just print out the $children variable which you would have to arrange so that the signal handler knows about the same $children as the parent, maybe with a file lexical or something. This would get the print out of the handler and the print of the parent will probably skip numbers as more than 1 child exits during a second but it should kind of keep up. Also you could count the number of children with statuses in the hash and subtract that from the total children to get the count which would make the handler only update the children hash.

      I've cut the handler down as much as I think I can to just:

      while ((my $pid = waitpid(-1, &WNOHANG)) > 0) { $children--; # need this to tell when we're done $exits{$pid} = $?; # need this to see the problem }

      then I dump %exits at the end of the script. However, I still see output like:

      18106 0
      18116 0
      18113 -1
      18105 0
      
Re: $? is -1???
by bluto (Curate) on Jun 21, 2004 at 19:22 UTC
    After you fork off the children is the parent using stuff like system/backticks/qx/etc? I've seen cases in the past where my SIGCHLD reaped the child of a system() call (and I wouldn't be surprised if system() on some machines could reap a child I forked as well). These resulted in the '-1' being returned.

    Since I'm paranoid about signal handlers and I tend to have code where the parent just hangs around waiting for the children, I don't usually install SIGCHLD, but just loop with waitpid/sleep on the PIDs I care about. If you don't have this luxury, try changing your waitpid to just wait for the pid's you care about.

      No, the parent just waits around doing nothing (except sleeping).

      On your advice, I tried removing the SIGCHLD handler, but taking the contents into the wait loop in the parent, so it now does:

      while ($children) { while ((my $pid = waitpid(-1, &WNOHANG)) > 0) { print "$?\n" if $?; delete $children{$pid}; $children--; print "$children children running\n"; } sleep; }

      where it used to just sleep

      However, this never terminated for me. My child processes become zombies and waitpid() never gives me anything back and I just loop forever. I also tried looping through the keys of the %children hash and waiting on each PID, but that didn't work any better for me. Am I misunderstanding something about your technique?

        With no arguments, the call to 'sleep' will sleep forever. Try something like 'sleep 1'.

        Is there a reason you are using WNOHANG? It seems like you could just use the blocking form of waitpid (unless I'm missing something).

        Other than that this looks ok. The main reason I mentioned waiting on each individual pid was that this comes in handy if your parent actually did use something like system().