gnosek has asked for the wisdom of the Perl Monks concerning the following question:

Hello,

I encountered a problem which is best represented as Perl code:

#!/usr/bin/perl use warnings FATAL => qw( all ); use strict; $SIG{ALRM} = sub { print "SIGARLM: main handler\n" }; sub f { local $SIG{ALRM} = sub { print "SIGALRM\n" }; # line 9 } if (fork) { # parent f while 1; } else { # child sleep 1; print "child starting\n"; 1 while kill ALRM => getppid; }

When I run this code, the parent process dies within seconds of the child entering the loop. It looks like the default handler gets occasionally executed (the really default one, terminating the process). I have also seen things like:

Unable to create sub named "" at testdrive/signal.pl line 9.
and (in a less stripped-down chunk of code throwing exceptions instead of printing stuff) just about every possible error message and several impossible ones.

I tested this on Debian and Ubuntu systems, both brand new and several years old, with perls 5.8.4, 5.8.8 and 5.10.0. Setting the signal mask manually (via sigprocmask) doesn't help either. Setting the handler without using local also doesn't change anything. I couldn't reproduce it under strace, due to changes in timing or weird things that strace does to signal handlers, I don't really know.

Any ideas?

Thanks,
Grzegorz Nosek

UPDATE:

Yes, the code is silly. My point is atomicity of setting signal handlers, not the most efficient way of catching SIGALRM :)

UPDATE 2:

local $SIG{FOO} = \&bar; is unsafe (the signal handler gets reset to SIG_DFL for a short moment, possibly crashing the process). I haven't been able to wrap it in sigprocmask safely in any way.

Setting $SIG{FOO} without local is apparently safe.

UPDATE 3:

A signal handler may sometimes leak out of an eval block even when it seems impossible from looking at Perl code. A simple workaround is to yield the CPU to allow signal delivery (Time::HiRes and sleep 1.01e-6 seems sufficient, although sleep 1e-6 is not). I guess it's nasty enough for a workaround.

UPDATE 4:

My real-world code still doesn't work though...

Replies are listed 'Best First'.
Re: Setting signal handlers considered unsafe?
by TGI (Parson) on Nov 05, 2008 at 18:45 UTC

    Have you tried any of the signal stuff in POSIX? I'm thinking of sigprocmask in particular.

    I'm no signal wizard, but try suspending the ALRM signal while you set the handler. When you reenable the ALRM handler, any pending ALRMs should be triggered.

    Let us know what you come up with, I'm curious.


    TGI says moo

      Just tried on my home Ubuntu system (perl 5.8.8). Ran for a few seconds (a new record!) and crashed with

      Unable to create sub named "" at crash.pl line 15.

      This is the exact code I used:
      #!/usr/bin/perl use warnings FATAL => qw( all ); use strict; use POSIX qw( :signal_h ); $SIG{ALRM} = sub { print "SIGARLM: main handler\n" }; sub f { my $sigset = POSIX::SigSet->new; my $blockset = POSIX::SigSet->new( SIGALRM ); sigprocmask(SIG_BLOCK, $blockset, $sigset ); local $SIG{ALRM} = sub { print "SIGALRM\n" }; # line 15 sigprocmask(SIG_SETMASK, $sigset ); } if (fork) { # parent f while 1; } else { # child sleep 1; print "child starting\n"; 1 while kill ALRM => getppid; }

        I'm no master of perlguts either, but it occurs to me that there are two places where you are vulnerable to signal changes: 1. when you set the localized handler, 2. when the localization goes out of scope and the old handler is installed.

        Your code traps only one of the two vulnerable points.

        What happens if you do:

        #!/usr/bin/perl use warnings FATAL => qw( all ); use strict; use POSIX qw( :signal_h ); $SIG{ALRM} = sub { print "SIGARLM: main handler\n" }; sub f { my $sigset = POSIX::SigSet->new; my $blockset = POSIX::SigSet->new( SIGALRM ); my $old_handler = $SIG{ALRM}; # Install new handler sigprocmask(SIG_BLOCK, $blockset, $sigset ); $SIG{ALRM} = sub { print "SIGALRM\n" }; # line 15 sigprocmask(SIG_SETMASK, $sigset ); # spin a bit. my $x; $x = $_ for 1..10; # Restore old handler. sigprocmask(SIG_BLOCK, $blockset, $sigset ); $SIG{ALRM} = $old_handler; sigprocmask(SIG_SETMASK, $sigset ); } if (fork) { # parent f while 1; } else { # child sleep 1; print "child starting\n"; 1 while kill ALRM => getppid; }


        TGI says moo

        On my system (This is perl, v5.8.8 built for i386-linux-thread-multi) your script ran for many minutes, until I stopped it. I am running CentOS 5.2 with the CentOS distribution of perl.

        Is it possible that your system is not running with safe/deferred signals (environment variable PERL_SIGNALS set to "unsafe")?

Re: Setting signal handlers considered unsafe?
by ig (Vicar) on Nov 05, 2008 at 22:15 UTC
    I couldn't reproduce it under strace

    I observed the failure under strace, running on CentOS 5.2 and perl 5.8.8.

    I changed the test script, adding a couple of prints in the subroutine and a delay in the child loop. I was interested to see if SIG_DFL was being set on SIGALRM when the parent wasn't being interrupted and it is, every time the subroutine sets its handler, but not when the local %SIG goes out of scope and the original handler is restored.

    The modified code:

    Strace output while not being interrupted

    Note that when %SIG{ALRM} is set in the subroutine, sigaction() is called twice, once to set SIGALRM to SIG_DFL and then to set SIGALRM to a handler routine. Then, when the subroutine ends and the local %SIG goes out of scope, sigaction(0 is called only once, setting SIGALRM to the same handler routine. So, SIG_DFL is being set systematically every time you reset the signal handler and this has nothing to do with race conditions in the signal handler as it happens even when no signals have been received. This explains why the signal is sometimes handled by the default handler and the process terminates.

    Here is a section of strace output when an alarm was received while the subroutine handler was in place.

    Here is a section of strace output when an alarm was received while the main signal handler was in place.

    And here is the tail of the strace log when an alarm was received while the default handler was in place.

    This behaviour is all quite predictable given that SIGALRM is sometimes set to SIG_DFL.

    The question is, why is SIGALRM being set to SIG_DFL then back to perl's signal handler each time the local %SIG is set?

      This behaviour is all quite predictable given that SIGALRM is sometimes set to SIG_DFL.

      Yes, naturally.

      The question is, why is SIGALRM being set to SIG_DFL then back to perl's signal handler each time the local %SIG is set?

      Thank you for your detailed analysis. This agrees with my observation that local $SIG{...} increases the chance of "something breaking".

      The following code doesn't die immediately and doesn't set the handler to SIG_DFL anywhere (except once during what looks like early startup). I'm sure I have already tried it, but everybody makes mistakes :) I'll give it a more intensive test tomorrow.

      why is SIGALRM being set to SIG_DFL then back to perl's signal handler each time the local %SIG is set?

      It's easy to think of "local $var = $val;" one operation, but there are two. "local" and "=" are separate operations.

      local saves the current value of the variable, changes the value to undef and returns the variable as an lvalue. For $SIG{...}, the second step translates to switching to the default handler.

      Then the assignment to the lvalue returned by local occurs, changing the signal handler to the custom handler.

      The following code illustrates this clearly:

      use strict; use warnings; BEGIN { package Testing; use Tie::Scalar qw( ); BEGIN { our @ISA = qw(Tie::StdScalar); } sub STORE { my ($self, $val) = @_; print("STORE ", defined($val) ? "[$val]" : 'undef', "\n"); shift->SUPER::STORE(@_) } } { our $s; tie $s, 'Testing'; { print("pre\n"); local $s; print("post\n"); } print("\n"); { print("pre\n"); local $s = 'a'; print("post\n"); } }
      pre STORE undef # local post STORE undef # local unroll pre STORE undef # local STORE [a] # assignment post STORE undef # local unroll

      Here's a solution:

      use Sub::ScopeFinalizer qw( scope_finalizer ); sub local_sassign { my $r = \($_[0]); my $sentry = scope_finalizer { $$r = $_[0] } { args => [ $$r ] }; $$r = $_[1]; return $sentry; } sub f { my $sentry = local_sassign $SIG{ALRM}, \&alarm_handler; ... }
Re: Setting signal handlers considered unsafe?
by pjotrik (Friar) on Nov 05, 2008 at 17:34 UTC
    I believe this is a problem with catching the parent in middle of setting the handler. You're manipulating the value of SIG{ALRM}, this operation (line 9) is more than probably not atomic. At the same time, you're trying to access this handler by sending a signal. The value is not well defined at this time and you get punished.

    SPECULATION: Perhaps perl first sets the handler to default, then creates a new handler structure (I don't know perl guts, but there's gonna be something like that) and then assigns it.

    By sending the signals not that often, like by changing the 1 while kill to sleep 1 while kill you get a chance that you catch the parent in a less vulnerable state and one of your handlers gets executed. Try running it a few times.

    and BTW, setting the global SIG{ALRM} instead of the local in f, it _does_ start to work, at least on my system (Fedora 8, perl 5.8.8)

Re: Setting signal handlers considered unsafe?
by pileofrogs (Priest) on Nov 05, 2008 at 19:13 UTC

    What are you actually trying to do here? What did you expect to see?

    Do you want to set a handler in the parent? You don't need to do anything fancy. Just set the handler in the parent. You can do it from a sub if you want. Don't do it in a while 1 loop, that's just asking for trouble. You don't need to local it, because it's a forked process at that point, it's totally separate from the child.

    Once you go into mult-process and multi-thread, you really have to watch out. Strange error messages that are different every time you run are pretty normal if you don't have a good understanding of the non-sequentiality of these things.

    --Pileofrogs

      Hi,

      First of all, thanks everybody for your numerous replies :)

      Second, I guess I should have written in my original post. I know a tight loop of setting a local signal handler is silly, just like calling kill in another tight loop. I don't do that in my code, really. It's just the simplest way of reproducing the issue. It happens, say, once a week in the real world and once a second in the demo code.

      Third, I do have a pretty good grasp of multiprocessing issues, including atomicity of signal handlers and (simplifying a bit) non-sharing of address space between different processes. I don't expect to learn why local $SIG{FOO} = \&bar; is silly, I expect to learn why it isn't atomic and what I can do about it.

      I have tried sigprocmask together with the totally undocumented POSIX::SigSet (does it have something like ->add?) but I can still easily "crash" perl with this code. Avoiding local and setting/resetting the handler "by hand" seemed to decrease the frequency in the real world but the race still exists.

      Also, strace shows that perl internally does mostly the same thing (block signal, call sigaction, unblock) when setting the handler so I don't think there's anything more I can do, other than cry for help in the Monastery ;)

        ... totally undocumented POSIX::SigSet ... Try POSIX
Re: Setting signal handlers considered unsafe?
by ig (Vicar) on Nov 05, 2008 at 22:22 UTC

    With perl 5.8.8 on CentOS 5.2, exposure of SIG_DFL has something to do with setting a local $SIG{ALRM}.

    #!/opt/perl/bin/perl use strict; use warnings; print "set handler 1\n"; $SIG{ALRM} = sub { print "handler 1\n"; }; print "set handler 2\n"; $SIG{ALRM} = sub { print "handler 2\n"; }; print "set handler 3 (local)\n"; { local $SIG{ALRM} = sub { print "handler 3\n"; }; }

    produces the following strace output:

    write(1, "set handler 1\n", 14set handler 1 ) = 14 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) = 0 rt_sigaction(SIGALRM, {0x80a3330, [], 0}, {SIG_DFL}, 8) = 0 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 write(1, "set handler 2\n", 14set handler 2 ) = 14 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) = 0 rt_sigaction(SIGALRM, {0x80a3330, [], 0}, {0x80a3330, [], 0}, 8) = 0 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 write(1, "set handler 3 (local)\n", 22set handler 3 (local) ) = 22 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) = 0 rt_sigaction(SIGALRM, {SIG_DFL}, {0x80a3330, [], 0}, 8) = 0 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) = 0 rt_sigaction(SIGALRM, {0x80a3330, [], 0}, {SIG_DFL}, 8) = 0 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) = 0 rt_sigaction(SIGALRM, {0x80a3330, [], 0}, {0x80a3330, [], 0}, 8) = 0 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 exit_group(0) = ?

    Update: It is localizing $SIG{ALRM} that sets the signal handler to SIG_DFL. Presumably a side effect of $SIG{ALRM} being set to undef when it is localized.

      It is localizing $SIG{ALRM} that sets the signal handler to SIG_DFL. Presumably a side effect of $SIG{ALRM} being set to undef when it is localized.

      That is stunningly unfriendly. Particularly as localisation is suggested in the documentation (perlipc).

      Looks like a bug to me... or at least in need of a serious (additional) caveat in the documentation ?

        It's not a side-effect. It's not a bug. local undefines its argument. Details

      I'm pleased to see that you have reported this as a bug:

        #60360: local $SIG{FOO} = sub {...}; sets signal handler to SIG_DFL
      

      Looking at the strace output:

        rt_sigprocmask(SIG_BLOCK, ALRM, [], 8) = 0
        rt_sigaction(SIGALRM, {SIG_DFL}, {0x80a3330, [], 0}, 8) = 0
        rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
      
      it appears that the SIG_BLOCK call of rt_sigprocmask is storing the mask state, and that the SIG_SETMASK call is restoring that -- certainly that's what one would hope for !

      While one waits for a fix, the work-around seems to be to wrap  local $SIG{FOO} = ... in a  SIG_BLOCK/SIG_SETMASK pair -- as suggested by brother gnosek in 721812, above:

      use POSIX qw(:signal_h) ; my $sigset = POSIX::SigSet->new ; my $blockset = POSIX::SigSet->new( SIGALRM ) ; sigprocmask(SIG_BLOCK, $blockset, $sigset ); local $SIG{ALRM} = sub .... ; sigprocmask(SIG_SETMASK, $sigset );

      The code below tests with and without the extra masking. With the extra masking I haven't seen it fail. Without the extra masking it fails almost instantly with a "TWIDDLE" of 1, but seems to last longer for larger "TWIDDLE".

      What I find remarkable is how quickly this fails when the extra masking is disabled. SIG_DFL is set only briefly. On the assumption that the ALRM signals are essentially random wrt the main process, I'd expect the time to fail to be more or less related to the time that SIG_DFL is set.

      I may be underestimating how long the various system calls take, compared to the small amount of Perl code. Or there may be some part of the system calls causing some synchronisation between the processes, or a greater likelihood of collecting an ALRM at the SIG_SETMASK call...

      The $TWIDDLE argument increases the time spent with the "temporary" ALRM handler set. Increasing that value does appear to increase how long it takes before the SIG_DFL handler is invoked (in the absense of the extra masking) and the program comes to a sudden Alarm clock halt.

      I have not seen the error:

        Unable to create sub named "" at ....
      
      reported in 721812, but brother gnosek is running 5.8.8 while I am running 5.10.0. I wonder if it's to do with creating a very large number of anonymous subroutines ?

        I'm pleased to see that you have reported this as a bug:

        If perl were new, I might argue that local should not set variables when localizing (to undef or anything else and regardless of whether there is an initializer), as initializers are available to set the value and including this feature in local precludes not changing the value. But perl is not new and changing local now would no doubt be problematic.

        After reading the further posts in this thread, I am inclined to the view that the design of local is less than ideal but that the only bug is in the documentation. In which case I guess I should submit a patch to the docs to go along with my bug report. I am thinking perlsub and perlipc should probably both be changed. Are there other sections that need attention?

        Maybe what is needed is a new operator, very much like local except that it does not change the value of the variable. It's description in perlsub would be exactly as for local except for the definition of what happens if there is no initializer, which might read as follows for the new operator:

        If no initializer is given for a particular variable its value is not changed if it already exists or, if it does not already exist, its value is set to undef.

        This would leave local as is, so no existing scripts would be broken, and it would provide a new means of managing signal handlers, and anything else where setting the variable to undef, even briefly, is undesirable.

        But this wouldn't be a bug, this would be a feature request.

        Update: just as I was about to submit patches to perlsub and perlipc, I received notice that a patch has been applied that skips setting undef if there is an initializer. I hope it doesn't upset anyone and get taken out.

Re: Setting signal handlers considered unsafe?
by ig (Vicar) on Nov 05, 2008 at 23:08 UTC

    You might try the following on your system and let us know what happens...

    Update: I have now run this script for well over an hour without any faults. There may still be race conditions as the signal handlers are changed, but the windows of vulnerability are much smaller than with the original script. I have not seen the Unable to create sub named "" error at any time.

      Apparently not using local is enough to prevent the crashes. sigprocmask also solves the issue and I have been confused about the root cause of my problems.

      It looks like the signal handler (essentially calling die) leaked out of the eval { } block. Or doesn't it?

      This code:

      produces results like:
      1..1 # parent entering wrap_sigs loop # child starting signal storm SIGALRM SIGALRM END failed--call queue aborted.

      Sorry for shifting the goalposts but have you got any ideas?

        I tweaked you code a little to get a view of what was happening:

        my $rcvd = 0 ; # added counter for signals seen in eval's handle +r sub wrap_sigs { ... eval { for my $sig (keys %$signals) { $old_sighandlers{$sig} = ($SIG{$sig} || 'DEFAULT'); $SIG{$sig} = sub { $rcvd++ ; die ("SIG $sig $rcvd\n") }; ...
        and the result was:
        1..1
        # parent entering wrap_sigs loop
        # child starting signal storm
        SIG ALRM 5
        # Looks like your test died before it could output anything.
        
        from which I conclude that a number of ALRM signals were trapped in the eval, but eventually Perl left eval state, with the die handler still set.

        Running the thing a number of times I got quite a wide range of numbers of ALRM signals swallowed while in the eval.

        have you got any ideas?

        The problem is that the signal handler is not being reset when the eval is terminated by the die in the signal handler, so the solution is to reset the signal handler in the signal handler.

        I modified your script to do this and it survived the signal storm 10 times out of 10 on my system.

        I also added a few print statements to see how often and when various bits executed. I was surprised how few times a signal was caught: quite reliably between 10 and 15 times, despite how quickly the two loops execute. I guess this has to do with how often the process scheduler switches running processes on my single CPU system.

Re: Setting signal handlers considered unsafe?
by gone2015 (Deacon) on Nov 05, 2008 at 18:36 UTC
    sub f { local $SIG{ALRM} = sub { print "SIGALRM\n" }; # line 9 } if (fork) { # parent f while 1; } else {

    The f while 1 will call f repeatedly and for ever. I don't know why you'd want to keep setting $SIG{ALRM} over and over again. I also guess that the local will use a small amount of memory each time....

    Update: I apologize: I had missed the point that the mystery was that setting $SIG{ALRM} in the way shown seemed to expose the default setting, despite there being a previous setting replacing the default. While I'm at it, I can find no evidence to support the suggestion that the local might be eating memory, so I retract that entirely <blush>. (I think I'll go and have a little lie down, now.)

Re: Setting signal handlers considered unsafe?
by perreal (Monk) on Nov 05, 2008 at 21:37 UTC
    On my machine:
    perl, v5.10.0 Linux localhost.localdomain 2.6.26.5-45.fc9.i686 #1 SMP i686 i386
    I got the following flavour of results:
    localhost$ perl d.pl child starting SIGARLM: main handler SIGARLM: main handler SIGARLM: main handler Maximal count of pending signals (120) exceeded at d.pl line 14. localhost$ perl d.pl child starting Alarm clock localhost$ perl d.pl child starting Maximal count of pending signals (120) exceeded at d.pl line 9. SIGARLM: main handler localhost$ perl d.pl child starting Alarm clock
    What exactly are you trying to do?
      What exactly are you trying to do?

      I want to ensure that at every moment of my program's life signals are safely handled. I have already experienced strange errors which boil down to non-atomicity of setting signal handlers (a signal gets delivered in the middle of setting a handler, thoroughly confusing perl).

Re: Setting signal handlers considered unsafe?
by ikegami (Patriarch) on Nov 12, 2008 at 18:59 UTC
Re: Setting signal handlers considered unsafe? (5.8.9)
by ikegami (Patriarch) on Dec 17, 2008 at 20:51 UTC
    perl589delta claims "A new entry has been added to the MAGIC vtable - svt_local. This is used when copying magic to the new value during local, allowing certain problems with localising shared variables to be resolved."

    I thought it might relate to this issue, but I don't seem to.

    >c:\progs\perl588\bin\perl 722505.pl pre STORE undef post STORE undef pre STORE undef STORE [a] post STORE undef >c:\progs\perl589\bin\perl 722505.pl pre STORE undef post STORE undef pre STORE undef <-- Still there STORE [a] post STORE undef

    The code is in Re^2: Setting signal handlers considered unsafe? (local assigns undef)