Earindil has asked for the wisdom of the Perl Monks concerning the following question:

Have a monitoring app written in perl that forks off a bunch of sub processes. Sometimes, these sub processes hang on bad nodes and won't die off. Been trying to use sig alrm to kill them but I just can't seem to get this right. Forking still confuses me a bit, so guessing I'm just not quite getting it here. Would appreciate any input.

Here is some sample code. Every fork should get killed off due to the sleep but isn't.
A bit of the code here was taken from examples I found, some of it probably isn't necessary.

#!/bin/perl for ($i=0; $i<=10; $i++) { wait_for_a_kid() if keys %pid_to_node > 3; $pid = fork; if ($pid) { ## parent does... $pid_to_node{$pid} = $i; } else { print "$i $$\n"; local $SIG {ALRM} = sub { kill -15, $$ or die "kill: $!"; print "\tKilled PID $$\n"}; # Just SIGTERM eval { ## child does... setpgrp(0,0); exit !&Test; alarm 1; waitpid $pid => 0; }; } } ## final reap: 1 while wait_for_a_kid(); sub wait_for_a_kid { my $pid = wait; return 0 if $pid < 0; my $node = delete $pid_to_node{$pid} or warn("Why did I see $p +id ($?)\n"), next; } sub Test { sleep 10; }
OUTPUT:

0 1668
1 1669
2 1670
3 1671
Why did I see 1668 (0)
5 1811
6 1812
7 1813
8 1814
9 1920
10 1921

Replies are listed 'Best First'.
Re: Killing Forks
by jbert (Priest) on Aug 14, 2007 at 12:58 UTC
    You're passing a -ve signal number to kill. From kill:

    "Unlike in the shell, if SIGNAL is negative, it kills process groups instead of processes. (On System V, a negative PROCESS number will also kill process groups, but that’s not portable.) That means you usually want to use positive not negative signals. You may also use a signal name in quotes."

    You probably also want to set the alarm before doing the operation which may take a long time.

    I'm not sure, but you might also be able to set the alarm handler (and possibly also call alarm) before the fork. I haven't checked, but those two properties might both be preserved across fork.

      Thanks much! Simply moved a few things around and changed the kill to a 15 from a -15 and that seemed to do it. New Version:
      #!/etrade/bin/perl for ($i=0; $i<=10; $i++) { wait_for_a_kid() if keys %pid_to_node > 3; $pid = fork; if ($pid) { ## parent does... $pid_to_node{$pid} = $i; } else { print "$i $$\n"; local $SIG {ALRM} = sub { print "\tKilled PID $$\n"; kill 15, $$ or die "kill: $!"}; # Just SIGTERM alarm 6; eval { ## child does... exit !&Test; waitpid $pid => 0; }; } } ## final reap: 1 while wait_for_a_kid(); sub wait_for_a_kid { my $pid = wait; return 0 if $pid < 0; delete $pid_to_node{$pid} or warn("Why did I see $pid ($?)\n") +, next; } sub Test { sleep 5; }
        Glad to know it's fixed.

        As I think about it a little, though, sending yourself a SIGTERM is a bit redundant, you could of course just 'die' or 'exit' in the SIGALRM handler.

        Also...you're calling 'waitpid' in the child (which is odd) after a call to exit (which is odder).

        The other way of doing this, of course, is to get the parent to periodically wake up (you can do this with waitpid and a sleep) and calculate if it should kill off a child (just store each pid's start time and kill off the old ones, maybe keeping them in a sorted list of age or something).

        One case where you might need to have the parent do it is if your long-running code calls out to some XS code which doesn't return up to perl when it receives a signal.

        Since perl 5.8, perl uses 'safe signals' by default, which means that your signal handlers only fire when control returns to the perl interpreter. This can be a long time (or never) when you call out of perl into C/XS code. (Safe signals are 'better', but you sometimes need to be aware of this issue when doing signal handling in perl).