Re: Setting signal handlers considered unsafe?
by TGI (Parson) on Nov 05, 2008 at 18:45 UTC
|
Have you tried any of the signal stuff in POSIX? I'm thinking of sigprocmask in particular.
I'm no signal wizard, but try suspending the ALRM signal while you set the handler. When you reenable the ALRM handler, any pending ALRMs should be triggered.
Let us know what you come up with, I'm curious.
| [reply] [d/l] |
|
|
Unable to create sub named "" at crash.pl line 15.
This is the exact code I used:
#!/usr/bin/perl
use warnings FATAL => qw( all );
use strict;
use POSIX qw( :signal_h );
$SIG{ALRM} = sub { print "SIGARLM: main handler\n" };
sub f {
my $sigset = POSIX::SigSet->new;
my $blockset = POSIX::SigSet->new( SIGALRM );
sigprocmask(SIG_BLOCK, $blockset, $sigset );
local $SIG{ALRM} = sub { print "SIGALRM\n" }; # line 15
sigprocmask(SIG_SETMASK, $sigset );
}
if (fork) {
# parent
f while 1;
} else {
# child
sleep 1;
print "child starting\n";
1 while kill ALRM => getppid;
}
| [reply] [d/l] [select] |
|
|
I'm no master of perlguts either, but it occurs to me that there are two places where you are vulnerable to signal changes: 1. when you set the localized handler, 2. when the localization goes out of scope and the old handler is installed.
Your code traps only one of the two vulnerable points.
What happens if you do:
#!/usr/bin/perl
use warnings FATAL => qw( all );
use strict;
use POSIX qw( :signal_h );
$SIG{ALRM} = sub { print "SIGARLM: main handler\n" };
sub f {
my $sigset = POSIX::SigSet->new;
my $blockset = POSIX::SigSet->new( SIGALRM );
my $old_handler = $SIG{ALRM};
# Install new handler
sigprocmask(SIG_BLOCK, $blockset, $sigset );
$SIG{ALRM} = sub { print "SIGALRM\n" }; # line 15
sigprocmask(SIG_SETMASK, $sigset );
# spin a bit.
my $x;
$x = $_ for 1..10;
# Restore old handler.
sigprocmask(SIG_BLOCK, $blockset, $sigset );
$SIG{ALRM} = $old_handler;
sigprocmask(SIG_SETMASK, $sigset );
}
if (fork) {
# parent
f while 1;
} else {
# child
sleep 1;
print "child starting\n";
1 while kill ALRM => getppid;
}
| [reply] [d/l] |
|
|
On my system (This is perl, v5.8.8 built for i386-linux-thread-multi) your script ran for many minutes, until I stopped it. I am running CentOS 5.2 with the CentOS distribution of perl.
Is it possible that your system is not running with safe/deferred signals (environment variable PERL_SIGNALS set to "unsafe")?
| [reply] |
|
|
Re: Setting signal handlers considered unsafe?
by ig (Vicar) on Nov 05, 2008 at 22:15 UTC
|
I couldn't reproduce it under strace
I observed the failure under strace, running on CentOS 5.2 and perl 5.8.8.
I changed the test script, adding a couple of prints in the subroutine and a delay in the child loop. I was interested to see if SIG_DFL was being set on SIGALRM when the parent wasn't being interrupted and it is, every time the subroutine sets its handler, but not when the local %SIG goes out of scope and the original handler is restored.
The modified code:
Strace output while not being interrupted
Note that when %SIG{ALRM} is set in the subroutine, sigaction() is called twice, once to set SIGALRM to SIG_DFL and then to set SIGALRM to a handler routine. Then, when the subroutine ends and the local %SIG goes out of scope, sigaction(0 is called only once, setting SIGALRM to the same handler routine. So, SIG_DFL is being set systematically every time you reset the signal handler and this has nothing to do with race conditions in the signal handler as it happens even when no signals have been received. This explains why the signal is sometimes handled by the default handler and the process terminates.
Here is a section of strace output when an alarm was received while the subroutine handler was in place.
Here is a section of strace output when an alarm was received while the main signal handler was in place.
And here is the tail of the strace log when an alarm was received while the default handler was in place.
This behaviour is all quite predictable given that SIGALRM is sometimes set to SIG_DFL.
The question is, why is SIGALRM being set to SIG_DFL then back to perl's signal handler each time the local %SIG is set?
| [reply] [d/l] [select] |
|
|
This behaviour is all quite predictable given that SIGALRM is sometimes set to SIG_DFL.
Yes, naturally.
The question is, why is SIGALRM being set to SIG_DFL then back to perl's signal handler each time the local %SIG is set?
Thank you for your detailed analysis. This agrees with my observation that local $SIG{...} increases the chance of "something breaking".
The following code doesn't die immediately and doesn't set the handler to SIG_DFL anywhere (except once during what looks like early startup). I'm sure I have already tried it, but everybody makes mistakes :) I'll give it a more intensive test tomorrow.
| [reply] [d/l] |
|
|
why is SIGALRM being set to SIG_DFL then back to perl's signal handler each time the local %SIG is set?
It's easy to think of "local $var = $val;" one operation, but there are two. "local" and "=" are separate operations.
local saves the current value of the variable, changes the value to undef and returns the variable as an lvalue. For $SIG{...}, the second step translates to switching to the default handler.
Then the assignment to the lvalue returned by local occurs, changing the signal handler to the custom handler.
The following code illustrates this clearly:
use strict;
use warnings;
BEGIN {
package Testing;
use Tie::Scalar qw( );
BEGIN { our @ISA = qw(Tie::StdScalar); }
sub STORE {
my ($self, $val) = @_;
print("STORE ", defined($val) ? "[$val]" : 'undef', "\n");
shift->SUPER::STORE(@_)
}
}
{
our $s;
tie $s, 'Testing';
{
print("pre\n");
local $s;
print("post\n");
}
print("\n");
{
print("pre\n");
local $s = 'a';
print("post\n");
}
}
pre
STORE undef # local
post
STORE undef # local unroll
pre
STORE undef # local
STORE [a] # assignment
post
STORE undef # local unroll
Here's a solution:
use Sub::ScopeFinalizer qw( scope_finalizer );
sub local_sassign {
my $r = \($_[0]);
my $sentry = scope_finalizer { $$r = $_[0] } { args => [ $$r ] };
$$r = $_[1];
return $sentry;
}
sub f {
my $sentry = local_sassign $SIG{ALRM}, \&alarm_handler;
...
}
| [reply] [d/l] [select] |
Re: Setting signal handlers considered unsafe?
by pjotrik (Friar) on Nov 05, 2008 at 17:34 UTC
|
I believe this is a problem with catching the parent in middle of setting the handler. You're manipulating the value of SIG{ALRM}, this operation (line 9) is more than probably not atomic. At the same time, you're trying to access this handler by sending a signal. The value is not well defined at this time and you get punished.
SPECULATION: Perhaps perl first sets the handler to default, then creates a new handler structure (I don't know perl guts, but there's gonna be something like that) and then assigns it.
By sending the signals not that often, like by changing the 1 while kill to sleep 1 while kill
you get a chance that you catch the parent in a less vulnerable state and one of your handlers gets executed. Try running it a few times.
and BTW, setting the global SIG{ALRM} instead of the local in f, it _does_ start to work, at least on my system (Fedora 8, perl 5.8.8) | [reply] [d/l] [select] |
Re: Setting signal handlers considered unsafe?
by pileofrogs (Priest) on Nov 05, 2008 at 19:13 UTC
|
What are you actually trying to do here? What did you expect to see?
Do you want to set a handler in the parent? You don't need to do anything fancy. Just set the handler in the parent. You can do it from a sub if you want. Don't do it in a while 1 loop, that's just asking for trouble. You don't need to local it, because it's a forked process at that point, it's totally separate from the child.
Once you go into mult-process and multi-thread, you really have to watch out. Strange error messages that are different every time you run are pretty normal if you don't have a good understanding of the non-sequentiality of these things.
--Pileofrogs
| [reply] |
|
|
Hi,
First of all, thanks everybody for your numerous replies :)
Second, I guess I should have written in my original post. I know a tight loop of setting a local signal handler is silly, just like calling kill in another tight loop. I don't do that in my code, really. It's just the simplest way of reproducing the issue. It happens, say, once a week in the real world and once a second in the demo code.
Third, I do have a pretty good grasp of multiprocessing issues, including atomicity of signal handlers and (simplifying a bit) non-sharing of address space between different processes. I don't expect to learn why local $SIG{FOO} = \&bar; is silly, I expect to learn why it isn't atomic and what I can do about it.
I have tried sigprocmask together with the totally undocumented POSIX::SigSet (does it have something like ->add?) but I can still easily "crash" perl with this code. Avoiding local and setting/resetting the handler "by hand" seemed to decrease the frequency in the real world but the race still exists.
Also, strace shows that perl internally does mostly the same thing (block signal, call sigaction, unblock) when setting the handler so I don't think there's anything more I can do, other than cry for help in the Monastery ;)
| [reply] |
|
|
... totally undocumented POSIX::SigSet ... Try POSIX
| [reply] |
|
|
Re: Setting signal handlers considered unsafe?
by ig (Vicar) on Nov 05, 2008 at 22:22 UTC
|
With perl 5.8.8 on CentOS 5.2, exposure of SIG_DFL has something to do with setting a local $SIG{ALRM}.
#!/opt/perl/bin/perl
use strict;
use warnings;
print "set handler 1\n";
$SIG{ALRM} = sub { print "handler 1\n"; };
print "set handler 2\n";
$SIG{ALRM} = sub { print "handler 2\n"; };
print "set handler 3 (local)\n";
{
local $SIG{ALRM} = sub { print "handler 3\n"; };
}
produces the following strace output:
write(1, "set handler 1\n", 14set handler 1
) = 14
rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) = 0
rt_sigaction(SIGALRM, {0x80a3330, [], 0}, {SIG_DFL}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
write(1, "set handler 2\n", 14set handler 2
) = 14
rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) = 0
rt_sigaction(SIGALRM, {0x80a3330, [], 0}, {0x80a3330, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
write(1, "set handler 3 (local)\n", 22set handler 3 (local)
) = 22
rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) = 0
rt_sigaction(SIGALRM, {SIG_DFL}, {0x80a3330, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) = 0
rt_sigaction(SIGALRM, {0x80a3330, [], 0}, {SIG_DFL}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) = 0
rt_sigaction(SIGALRM, {0x80a3330, [], 0}, {0x80a3330, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
exit_group(0) = ?
Update: It is localizing $SIG{ALRM} that sets the signal handler to SIG_DFL. Presumably a side effect of $SIG{ALRM} being set to undef when it is localized.
| [reply] [d/l] [select] |
|
|
It is localizing $SIG{ALRM} that sets the signal handler to SIG_DFL. Presumably a side effect of $SIG{ALRM} being set to undef when it is localized.
That is stunningly unfriendly. Particularly as localisation is suggested in the documentation (perlipc).
Looks like a bug to me... or at least in need of a serious (additional) caveat in the documentation ?
| [reply] [d/l] [select] |
|
|
It's not a side-effect. It's not a bug. local undefines its argument. Details
| [reply] [d/l] |
|
|
|
|
|
|
|
#60360: local $SIG{FOO} = sub {...}; sets signal handler to SIG_DFL
Looking at the strace output:
rt_sigprocmask(SIG_BLOCK, ALRM, [], 8) = 0
rt_sigaction(SIGALRM, {SIG_DFL}, {0x80a3330, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
it appears that the SIG_BLOCK call of rt_sigprocmask is storing the mask state, and that the SIG_SETMASK call is restoring that -- certainly that's what one would hope for !
While one waits for a fix, the work-around seems to be to wrap local $SIG{FOO} = ... in a SIG_BLOCK/SIG_SETMASK pair -- as suggested by brother gnosek in 721812, above: use POSIX qw(:signal_h) ;
my $sigset = POSIX::SigSet->new ;
my $blockset = POSIX::SigSet->new( SIGALRM ) ;
sigprocmask(SIG_BLOCK, $blockset, $sigset );
local $SIG{ALRM} = sub .... ;
sigprocmask(SIG_SETMASK, $sigset );
The code below tests with and without the extra masking. With the extra masking I haven't seen it fail. Without the extra masking it fails almost instantly with a "TWIDDLE" of 1, but seems to last longer for larger "TWIDDLE".
What I find remarkable is how quickly this fails when the extra masking is disabled. SIG_DFL is set only briefly. On the assumption that the ALRM signals are essentially random wrt the main process, I'd expect the time to fail to be more or less related to the time that SIG_DFL is set.
I may be underestimating how long the various system calls take, compared to the small amount of Perl code. Or there may be some part of the system calls causing some synchronisation between the processes, or a greater likelihood of collecting an ALRM at the SIG_SETMASK call...
The $TWIDDLE argument increases the time spent with the "temporary" ALRM handler set. Increasing that value does appear to increase how long it takes before the SIG_DFL handler is invoked (in the absense of the extra masking) and the program comes to a sudden Alarm clock halt.
I have not seen the error:
Unable to create sub named "" at ....
reported in 721812, but brother gnosek is running 5.8.8 while I am running 5.10.0. I wonder if it's to do with creating a very large number of anonymous subroutines ?
| [reply] [d/l] [select] |
|
|
I'm pleased to see that you have reported this as a bug:
If perl were new, I might argue that local should not set variables when localizing (to undef or anything else and regardless of whether there is an initializer), as initializers are available to set the value and including this feature in local precludes not changing the value. But perl is not new and changing local now would no doubt be problematic.
After reading the further posts in this thread, I am inclined to the view that the design of local is less than ideal but that the only bug is in the documentation. In which case I guess I should submit a patch to the docs to go along with my bug report. I am thinking perlsub and perlipc should probably both be changed. Are there other sections that need attention?
Maybe what is needed is a new operator, very much like local except that it does not change the value of the variable. It's description in perlsub would be exactly as for local except for the definition of what happens if there is no initializer, which might read as follows for the new operator:
If no initializer is given for a particular variable its value is not changed if it already exists or, if it does not already exist, its value is set to undef.
This would leave local as is, so no existing scripts would be broken, and it would provide a new means of managing signal handlers, and anything else where setting the variable to undef, even briefly, is undesirable.
But this wouldn't be a bug, this would be a feature request.
Update: just as I was about to submit patches to perlsub and perlipc,
I received notice that a patch has been applied that skips setting undef if there is an initializer. I hope it doesn't upset anyone and get taken out.
| [reply] [d/l] |
Re: Setting signal handlers considered unsafe?
by ig (Vicar) on Nov 05, 2008 at 23:08 UTC
|
You might try the following on your system and let us know what happens...
Update: I have now run this script for well over an hour without any faults. There may still be race conditions as the signal handlers are changed, but the windows of vulnerability are much smaller than with the original script. I have not seen the Unable to create sub named "" error at any time.
| [reply] [d/l] |
|
|
Apparently not using local is enough to prevent the crashes. sigprocmask also solves the issue and I have been confused about the root cause of my problems.
It looks like the signal handler (essentially calling die) leaked out of the eval { } block. Or doesn't it?
This code:
produces results like:
1..1
# parent entering wrap_sigs loop
# child starting signal storm
SIGALRM
SIGALRM
END failed--call queue aborted.
Sorry for shifting the goalposts but have you got any ideas? | [reply] [d/l] [select] |
|
|
my $rcvd = 0 ; # added counter for signals seen in eval's handle
+r
sub wrap_sigs {
...
eval {
for my $sig (keys %$signals) {
$old_sighandlers{$sig} = ($SIG{$sig} || 'DEFAULT');
$SIG{$sig} = sub { $rcvd++ ; die ("SIG $sig $rcvd\n") };
...
and the result was:
1..1
# parent entering wrap_sigs loop
# child starting signal storm
SIG ALRM 5
# Looks like your test died before it could output anything.
from which I conclude that a number of ALRM signals were trapped in the eval, but eventually Perl left eval state, with the die handler still set.
Running the thing a number of times I got quite a wide range of numbers of ALRM signals swallowed while in the eval.
| [reply] [d/l] [select] |
|
|
|
|
|
|
have you got any ideas?
The problem is that the signal handler is not being reset when the eval is terminated by the die in the signal handler, so the solution is to reset the signal handler in the signal handler.
I modified your script to do this and it survived the signal storm 10 times out of 10 on my system.
I also added a few print statements to see how often and when various bits executed. I was surprised how few times a signal was caught: quite reliably between 10 and 15 times, despite how quickly the two loops execute. I guess this has to do with how often the process scheduler switches running processes on my single CPU system.
| [reply] [d/l] [select] |
|
|
Re: Setting signal handlers considered unsafe?
by gone2015 (Deacon) on Nov 05, 2008 at 18:36 UTC
|
sub f {
local $SIG{ALRM} = sub { print "SIGALRM\n" }; # line 9
}
if (fork) {
# parent
f while 1;
} else {
The f while 1 will call f repeatedly and for ever. I don't know why you'd want to keep setting $SIG{ALRM} over and over again. I also guess that the local will use a small amount of memory each time....
Update: I apologize: I had missed the point that the mystery was that setting $SIG{ALRM} in the way shown seemed to expose the default setting, despite there being a previous setting replacing the default. While I'm at it, I can find no evidence to support the suggestion that the local might be eating memory, so I retract that entirely <blush>. (I think I'll go and have a little lie down, now.)
| [reply] [d/l] [select] |
Re: Setting signal handlers considered unsafe?
by perreal (Monk) on Nov 05, 2008 at 21:37 UTC
|
perl, v5.10.0
Linux localhost.localdomain 2.6.26.5-45.fc9.i686 #1 SMP i686 i386
I got the following flavour of results:
localhost$ perl d.pl
child starting
SIGARLM: main handler
SIGARLM: main handler
SIGARLM: main handler
Maximal count of pending signals (120) exceeded at d.pl line 14.
localhost$ perl d.pl
child starting
Alarm clock
localhost$ perl d.pl
child starting
Maximal count of pending signals (120) exceeded at d.pl line 9.
SIGARLM: main handler
localhost$ perl d.pl
child starting
Alarm clock
What exactly are you trying to do? | [reply] [d/l] [select] |
|
|
| [reply] |
Re: Setting signal handlers considered unsafe?
by ikegami (Patriarch) on Nov 12, 2008 at 18:59 UTC
|
| [reply] |
Re: Setting signal handlers considered unsafe? (5.8.9)
by ikegami (Patriarch) on Dec 17, 2008 at 20:51 UTC
|
perl589delta claims "A new entry has been added to the MAGIC vtable - svt_local. This is used when copying magic to the new value during local, allowing certain problems with localising shared variables to be resolved."
I thought it might relate to this issue, but I don't seem to.
>c:\progs\perl588\bin\perl 722505.pl
pre
STORE undef
post
STORE undef
pre
STORE undef
STORE [a]
post
STORE undef
>c:\progs\perl589\bin\perl 722505.pl
pre
STORE undef
post
STORE undef
pre
STORE undef <-- Still there
STORE [a]
post
STORE undef
The code is in Re^2: Setting signal handlers considered unsafe? (local assigns undef) | [reply] [d/l] |