Safety of using alarm

delirium has asked for the wisdom of the Perl Monks concerning the following question:

My group has been having intermittent problems with hanging processes. These programs are binaries and we don't have access to the source code to investigate what is causing the problems. The dead processes cause other problems that I won't go into, but suffice to say that we need an automated way of killing them when they hang.

Today I hacked together this wrapper script with the simple idea that if the problem binary's logfile hasn't been updated in 3 minutes, the process is dead, and is safe to kill. This is our standard criteria for killing this process.

It works famously in testing, but before I migrate it, I wanted to get some feedback about using the alarm function the way I am. Is it safe to reset the alarm in the $SIG{ALRM} intercept? Is it possible for $SIG{ALRM} to not get called after the timeout? Are there handy modules I've never stumbled across that account for wacky edge cases?

This is for AIX Unix with Perl 5.6.0.

my $timeout = 60;
my $logfile = 'logfile.txt';
my @array = ('the_binary -option1', 'the_binary -option2', 'the_binary
+ -option3');

for (@array) {
    print "Running command $_\n";
    eval {
        local $SIG{ALRM} = sub {
            my $mod_time = time - (stat($logfile))[9];
            if ( $mod_time > 180 ) {
                die "alarm\n";
            }
            else {
                alarm 0;
                alarm $timeout;
            }
        };
        alarm $timeout;
        system($_);
        alarm 0;
    };
}
[download]

Comment on Safety of using alarm Download Code

Replies are listed 'Best First'.
Re: Safety of using alarm by Abigail-II (Bishop) on Mar 10, 2004 at 21:54 UTC
Prior to perl 5.8.0, signals in Perl are unsafe. This is because signals are handled immediately if they arrive, regardless whether the internals of perl are in a consistent state or not. Furthermore, lots of function (including many of the implicite functions, like looking at a variable) in Perl aren't reentrant safe. Abigail	[reply]
Re: Re: Safety of using alarm by delirium (Chaplain) on Mar 11, 2004 at 13:17 UTC
Thanks. Are you aware of any alternatives that are safer with 5.6.0? What type of problems might occur with the way I have things set up now? Segfaults out of the blue?	[reply]
Re: Safety of using alarm by Abigail-II (Bishop) on Mar 11, 2004 at 13:27 UTC
Are you aware of any alternatives that are safer with 5.6.0? Yes. Use something else than Perl. What type of problems might occur with the way I have things set up now? Segfaults out of the blue? Anything can happen. Segfaults. Data corruption. Pink monkeys flying out your managers ears. Don't think that unsafe signals means their damage is contained in a neatly defined box. In 5.6.0 signals are UNSAFE, and that's non-negotiable. Abigail	[reply]
Re: Re: Safety of using alarm by delirium (Chaplain) on Mar 11, 2004 at 15:22 UTC
Re: Safety of using alarm by Abigail-II (Bishop) on Mar 11, 2004 at 17:09 UTC
Re: Safety of using alarm by ambrus (Abbot) on Mar 10, 2004 at 21:46 UTC
( One problem can be with that approach, is that the program is frozen (or just waits too much) in a system call (most probably reading), you won't get the $SIG{ALRM} handler delivered until it's finished. ~~Also, it can also be a problem if the program uses alarm. Note that on some systems, the sleep function may use alarm.~~ ) deleted -- I didn't read the question properly. The third paragraph still holds. You might be better setting a cpu limit with the ulimit or setrlimit syscall, although I'm not sure how you can do that in perl.	[reply]
Re: Safety of using alarm by bluto (Curate) on Mar 11, 2004 at 16:32 UTC
One way to do this in perl safely, without too much more code, is to fork/exec and have the parent just execute a non-blocking waitpid followed by a short sleep in a loop until the time limit expires, and then kill the child. If you decide to use alarm, it is a good idea to have the parent process kill the child when the alarm occurs (i.e. I've seen cases were ALARM doesn't kill the child process and it ends up sticking around). bluto	[reply]