BUG! die() executing outside scope of eval block

jdhedden has asked for the wisdom of the Perl Monks concerning the following question:

The following code fragments come from a much larger program:

    my $chld_flag = 0;

    $SIG{'CHLD'} = sub { $chld_flag = 1; };
    $SIG{'ALRM'} = 'IGNORE';

    while (1) {
        # Check child processes
        #   Includes using waidpid() to clean up after child processes

        # Display status

        # Get user commands
        my $cmd = '';
        eval {
            # Allow interrupts by alarms and 'child' events
            local $SIG{'CHLD'} = sub { die("CHLD EVENT\n"); };
            local $SIG{'ALRM'} = sub { die("ALRM EVENT\n"); };

            alarm(10);

            if (! $chld_flag) {
                $cmd = <STDIN>;
            }
        };
        alarm(0);
        $chld_flag = 0;

        # Process user command, if any
        #   Such as launch child processes
    }
[download]

I believe there is a bug in Perl (5.8.2) in that this code occasionally exits due to die() - reporting 'CHLD EVENT' even though the eval block should prevent this from happening. (A test program that recreates this bug is provided below.)

I believe the problem occurs when a SIGALRM and SIGCHLD occur very close together. The SIGALRM causes the program to exit the eval block's scope with the die("ALRM EVENT\n"). This is okay.

However, before the %SIG array is restored with the original SIGCHLD handler (which does not contain a die()), a SIGCHLD occurs and the eval block's local scope SIGCHLD handler (which does contain a die()) is invoked. This is the bug.

I have another very different case that leads me to the same conclusion: A parent process monitors child processes and sends SIGUSR1s to them as needed. The child processes have a global SIGUSR1 handler that sets a flag, and inside an eval block a local SIGUSR1 handler that has a die().

    $SIG{'USR1'} = sub { $flag = 1; };

    eval {
        local $SIG{'USR1'} = sub { die "Timeout\n"; };

        DoWork();
    };
[download]

Occassionally, the parent finds child processes that return values from waidpid() with ($? % 255) == 255 which indicates that the child exited due to a die(). This again shows that the Perl interpreter is allowing signals to process just beyond an eval block's scope, but before the %SIG hash has been restored from changes inside the eval block.

Would someone familiar with the Perl interpreter's code please verify this problem: Namely, that signal handling is (improperly) allowed to occur following an eval block but before the %SIG hash has been restored to remove any signal handlers local to the eval block.

UPDATE:
As a workaround, I'm embedding each eval block inside another eval block to 'catch' the errant dies().

    eval {
        eval {
            # Allow interrupts by alarms and 'child' events
            local $SIG{'CHLD'} = sub { die("CHLD EVENT\n"); };
            local $SIG{'ALRM'} = sub { die("ALRM EVENT\n"); };

            alarm(10);

            if (! $chld_flag) {
                $cmd = <STDIN>;
            }
        };
    };
[download]

Is there a better workaround?

2nd UPDATE:
I now have test code to prove this bug exists!!!

#!/usr/bin/perl

#####
#
# Test program to reproduce the following Perl bug:
#    It is possible for a signal handler defined locally inside an
#    eval blocks to be executed outside the scope of the eval block.
#
# Just execute this Perl script.
# It will eventually (after a few minutes) exit when the bug occurs.
#
#####

use strict;
use warnings;

use Time::HiRes qw( usleep );

my $CHILD_MAX   = 25;  # Max number of children to run
my $child_count = 0;   # Count of children currently running
my $child_done  = 0;   # Flag that a child has terminated
my %child_pids;        # Holds child processes' PIDs

# Set the flag that a child has terminated
$SIG{'CHLD'} = sub { $child_done = 1; };

# Loop until the bug occurs
do {
    # Cleanup any terminated children
    if ($child_done) {
        $child_done = 0;

        # Check all child processes using non-blocking waitpid() call
        foreach my $pid (keys(%child_pids)) {
            if (waitpid($pid, 1) == $pid) {   # 1 = POSIX::WNOHANG
                delete($child_pids{$pid});
                $child_count--;
            }
        }
    }

    # Start more children
    while ($child_count < $CHILD_MAX) {
        my $pid;
        if (($pid = fork()) == 0) {
            # Child sleeps for a random amount of time and then exits
            my $usec = 950000 + int(rand(100000));
            usleep($usec);
            exit(0);
        }

        # Parent remembers the child's PID for later cleanup
        $child_pids{$pid} = undef;
        $child_count++;
    }

    # Try to recreate the bug
    eval {
        eval {
            # Local signal handler to 'kill' the sleep() call below
            local $SIG{'CHLD'} = sub { die("SIGCHLD\n"); };

            sleep(1);   # Hang around a bit
        };

        # Set the flag for cleaning up terminated child processes
        if ($@ && ($@ =~ /CHLD/)) {
            $child_done = 1;
        }
    };

    # Keep looping until the bug occurs
} while (! $@);

# When we get here, it shows that the signal handler
# defined inside the inner eval block above was 
# executed OUTSIDE the scope of the inner eval block!

print("Bug detected: $@");

exit(1);

# EOF
[download]

Based on this, I have submitted this to perlbug@perl.org.

Comment on BUG! die() executing outside scope of eval block Select or Download Code

Replies are listed 'Best First'.
Re: BUG? die() executing outside scope of eval block by Zaxo (Archbishop) on Dec 19, 2003 at 04:57 UTC
This is interesting. What happens if you say `$ PERL_SIGNALS=unsafe perl myscript.pl` to run this? (or whatever you and your shell like for setting environment variables.) See `perldoc perldelta` where it talks about the reintroduction of unsafe signal handling. The user docs are in perlrun for the PERL_SIGNALS environment variable, and perlipc for discussion of deferred signals. I am not sure whether a deferred 'safe' signal drags its initial sigaction along with it, or uses the one it finds when it is acknowleged. That could be OS-dependent. After Compline, Zaxo	[reply] [d/l]
Re: BUG? die() executing outside scope of eval block by liz (Monsignor) on Dec 19, 2003 at 10:08 UTC
I believe the problem occurs when a SIGALRM and SIGCHLD occur very close together. The SIGALRM causes the program to exit the eval block's scope with the die("ALRM EVENT\n"). Could it be the other way around? I see that you're switching off the alarm outside of the eval. Maybe you should switch off the alarm inside the SIGCHLD handler: `-local $SIG{'CHLD'} = sub { die("CHLD EVENT\n"); }; +local $SIG{'CHLD'} = sub { alarm(0); die("CHLD EVENT\n"); };` [download] Another approach would be to return to using unsafe signal handlers inside the eval, see POSIX's sigaction() for that. Liz Update: Actually, Perl 5.8.1 introduced a new idiom for temporarily allowing unsafe signals: `local $ENV{PERL_SIGNALS} = 'unsafe';` [download] From perl581delta.pod: Unsafe signals again available In Perl 5.8.0 the so-called "safe signals" were introduced. This means that Perl no longer handles signals immediately but instead "between opcodes", when it is safe to do so. The earlier immediate handling easily could corrupt the internal state of Perl, resulting in mysteri- ous crashes. However, the new safer model has its problems too. Because now an opcode, a basic unit of Perl execution, is never interrupted but instead let to run to completion, certain operations that can take a long time now really do take a long time. For example, certain network operations have their own blocking and timeout mechanisms, and being able to interrupt them immediately would be nice. Therefore perl 5.8.1 introduces a "backdoor" to restore the pre-5.8.0 (pre-5.7.3, really) signal behaviour. Just set the environment vari- able PERL_SIGNALS to "unsafe", and the old immediate (and unsafe) sig- nal handling behaviour returns. See "PERL_SIGNALS" in perlrun and "Deferred Signals (Safe Signals)" in perlipc. In completely unrelated news, you can now use safe signals with POSIX::SigAction. See "POSIX::SigAction" in POSIX.	[reply] [d/l] [select]
Re: Re: BUG? die() executing outside scope of eval block by jdhedden (Deacon) on Dec 19, 2003 at 13:25 UTC
Using unsafe signals sounds like a step backwards. Besides, why would that even work? The 'bug' is that there is a vulnerable gap between the end of the eval block's scope and the restoration of the %SIG hash. This vulnerability exists whether or not you're using safe or unsafe signals. The alarm(0) in the CHLD handler is a good idea, but this doesn't relate at all to the second scenario I presented (parent->SIGUSR1->child). So while it might be a workaround for the first case, it is not a general solution. I think the key issue is illustrated by this phrase: Perl no longer handles signals immediately but instead "between opcodes", when it is safe to do so. While this sounds good, it still does not say whether or not this 'bug' exists. If one opcode is the end of the eval block, and the next opcode is to start restoring %SIG, but a signal occurs in between them, then that is a bug. It is not enough to suppress signal handling between opcodes. Perl needs to go further and ensure safe signal handling when the %SIG hash modified inside an eval block. But again, my original request still exists. Would someone familiar with the Perl interpreter's code please verify what I am saying. Is there a 'gap' between the end of an eval block and the restoration of the %SIG hash such that an incoming signal could result in execution of the (now defunct) local signal handler that was inside the eval block?	[reply]