Cannot catch ALRM signal for timeout

Magice has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Cannot catch ALRM signal for timeout by almut (Canon) on Dec 08, 2009 at 14:09 UTC
I can replicate the problem with the following little test script: #!/usr/bin/perl sub process_scheme{ my ($cmd, $output_file) = @_; open (OUTPUT_FILE, ">", $output_file) \|\| die "cannot open log file +"; my $scheme_pid = open (WORKER, "$cmd \|") \|\| die "cannot pipe worker"; my $str; while (1) { if (eof(WORKER)){ close WORKER; close OUTPUT; return 1; } eval{ local $SIG{ALRM} = sub { die "alarm\n" }; # NB: \n require +d alarm 1; $str = <WORKER>; alarm 0; }; if ($@) { close OUTPUT_FILE; close WORKER; return 0; } print OUTPUT_FILE $str; } } my $code = '$\|=1; my $t=2; for (1..100) { printf "%.1f\n", $_$t; sele +ct undef,undef,undef,$t; }'; my $cmd = "perl -e '$code'"; process_scheme($cmd, "test.out"); [download] The command being run prints a line every two seconds, so with a timeout of one second, you'd expect the timeout to occurr (it doesn't, however...) When you run it under `strace`, i.e. `$ strace -tttf ./811706.pl` [download] the following repeating pattern can be observed: ... [pid 7341] 1260280139.166408 write(1, "2.0\n", 4 <unfinished ...> [pid 7340] 1260280139.166437 <... read resumed> "2.0\n", 4096) = 4 [pid 7340] 1260280139.166504 rt_sigaction(SIGALRM, NULL, {SIG_DFL}, 8 +) = 0 [pid 7340] 1260280139.166564 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) + = 0 [pid 7340] 1260280139.166614 rt_sigaction(SIGALRM, {SIG_DFL}, {SIG_DF +L}, 8) = 0 [pid 7340] 1260280139.166648 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) + = 0 [pid 7340] 1260280139.166678 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) + = 0 [pid 7340] 1260280139.166708 rt_sigaction(SIGALRM, {0x46ed80, [], SA_ +RESTORER, 0x2ac53d237140}, {SIG_DFL}, 8) = 0 [pid 7340] 1260280139.166744 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) + = 0 [pid 7340] 1260280139.166774 alarm(1) = 0 [pid 7340] 1260280139.166814 alarm(0) = 1 [pid 7340] 1260280139.166844 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) + = 0 [pid 7340] 1260280139.166875 rt_sigaction(SIGALRM, {SIG_DFL}, {0x46ed +80, [], SA_RESTORER, 0x2ac53d237140}, 8) = 0 [pid 7340] 1260280139.166910 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) + = 0 [pid 7340] 1260280139.166950 read(6, <unfinished ...> [pid 7341] 1260280139.166965 <... write resumed> ) = 4 [pid 7341] 1260280139.166990 select(0, NULL, NULL, NULL, {2, 0}) = 0 +(Timeout) [pid 7341] 1260280141.168299 write(1, "4.0\n", 4) = 4 [pid 7341] 1260280141.168344 select(0, NULL, NULL, NULL, {2, 0} <unfi +nished ...> [pid 7340] 1260280141.168370 <... read resumed> "4.0\n", 4096) = 4 [pid 7340] 1260280141.168413 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) + = 0 [pid 7340] 1260280141.168461 rt_sigaction(SIGALRM, {SIG_DFL}, {SIG_DF +L}, 8) = 0 [pid 7340] 1260280141.168501 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) + = 0 [pid 7340] 1260280141.168532 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) + = 0 [pid 7340] 1260280141.168566 rt_sigaction(SIGALRM, {0x46ed80, [], SA_ +RESTORER, 0x2ac53d237140}, {SIG_DFL}, 8) = 0 [pid 7340] 1260280141.168611 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) + = 0 [pid 7340] 1260280141.168640 alarm(1) = 0 [pid 7340] 1260280141.168675 alarm(0) = 1 ... [download] In particular, note that the time between `alarm(1)` and `alarm(0)` is only around 40 microseconds, with no `read` system call in between. Presumably, the reason for this is that reading is buffered, i.e. that Perl's `$str = <WORKER>` does not directly map to a system read. In other words, what happens in between the alarm system calls is that Perl reads from its internal buffer (which takes almost no time (and is invisible in the trace)), the alarm is being reset, and then* the system read is called to fill the buffer again... The theory is comfirmed by using `sysread WORKER, $str, 4` instead, in which case the timeout works as expected.	[reply] [d/l] [select]
Re^2: Cannot catch ALRM signal for timeout by ikegami (Patriarch) on Dec 08, 2009 at 15:42 UTC
and then the system read is called to fill the buffer again... You make it sound like Perl launches a thread to fill the buffer in the background. In reality, it's `eof` that reads from the file. Move the `eof` into the `eval` and the timeout will take effect.	[reply] [d/l] [select]
Re^3: Cannot catch ALRM signal for timeout by almut (Canon) on Dec 08, 2009 at 15:57 UTC
You make it sound like Perl launches a thread to fill the buffer That's not what I meant. Perl tries to fill the buffer whenever it's empty and something is attempting to read from it — which in this case is the `eof`, as you note.	[reply] [d/l]
Re^3: Cannot catch ALRM signal for timeout by Magice (Novice) on Dec 08, 2009 at 19:59 UTC
ikegami, what do you mean by "move the eof into the eval"? I try this: sub process_scheme{ my ($output_file) = @_; open (OUTPUT_FILE, ">", $output_file) \|\| die "cannot open lo +g file"; my $scheme_pid = open (WORKER, "worker_program < $input \|") \|\| die "cannot pipe worker"; my $str; while (1) { eval{ local $SIG{ALRM} = sub { die "alarm\n" }; # NB: \n r +equired alarm 1; if (eof(WORKER)){ close WORKER; close OUTPUT; return 1; } $str = <WORKER>; alarm 0; # if this line is removed, the signal is ca +ught, but outside of eval, so it halts the script }; if ($@) { close OUTPUT_FILE; close WORKER; return 0; } # process the line # some code ... # print the line to log print OUTPUT_FILE $str; } } [download] but it still does not work :( Thank you for your hint, tho	[reply] [d/l]
Re^4: Cannot catch ALRM signal for timeout by ikegami (Patriarch) on Dec 08, 2009 at 21:24 UTC
Re^5: Cannot catch ALRM signal for timeout by Magice (Novice) on Dec 08, 2009 at 22:28 UTC
Some notes below your chosen depth have not been shown here
Re^2: Cannot catch ALRM signal for timeout by Magice (Novice) on Dec 08, 2009 at 14:56 UTC
Ouch! That explains it. Thank you very much, almut!!!	[reply]
Re: Cannot catch ALRM signal for timeout by almut (Canon) on Dec 08, 2009 at 11:40 UTC
...so that if the worker takes too long What exactly is taking too long? I.e., does the worker program stop delivering output, so that the `$str = <WORKER>` blocks, or does it just keep outputting lines for too long? In the latter case, I wouldn't expect the `alarm` to work, because the alarm is always being reset immediately after the read (which takes almost no time if it succeeds)... An alternative to the alarm approach would be to fork and have the child kill the external process after a while. (I'll post an example later (unless someone else has already done so by then) — it's lunch-time here...)	[reply] [d/l] [select]
Re^2: Cannot catch ALRM signal for timeout by almut (Canon) on Dec 08, 2009 at 15:17 UTC
FWIW, here's the promised example, i.e. the alternative approach using a separate "watch" process to handle the killing: #!/usr/bin/perl sub run_with_timeout { my ($cmd, $timeout, $outref) = @_; my $rc = -1; # return code in case of timeout if (my $pid = open(my $pipe, "-\|")) { my $pid2 = fork(); die "cannot fork" unless defined $pid; if ($pid2) { @$outref = <$pipe>; close $pipe; # does an implicit wait, returns status + in $? my $normalexit = ($? & 0x7f)==0; $rc = $? >> 8 if $normalexit; kill 'INT', $pid2 if $normalexit; # watcher no longer nee +ded (cmd finished) wait; return $rc; } else { select undef,undef,undef,$timeout; kill 'INT', $pid; # kill command if still there exit; } } else { exec "exec $cmd 2>&1"; # exec, so $cmd runs under $pid } } my $cmd = <<'EOCMD'; perl -e ' $\|=1; my $t=0.2; for (0..10) { printf "%.1f\n", $_*$t; select undef,undef,undef, $t; } exit 42;'; EOCMD my $timeout = 1.5; my $output = []; my $ret = run_with_timeout($cmd, $timeout, $output); printf "return code: %d%s\n", $ret, $ret == -1 ? ' (timed out)':''; print "output of command:\n",@$output; __END__ return code: -1 (timed out) output of command: 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 [download]	[reply] [d/l]
Re^2: Cannot catch ALRM signal for timeout by Magice (Novice) on Dec 08, 2009 at 13:01 UTC
By "taking too long", I mean if the worker stops delivering output. My idea is to wait for the output of the worker, and if the worker does not write/print anything in a while, then reset it. Thank you!	[reply]
Re: Cannot catch ALRM signal for timeout by Fox (Pilgrim) on Dec 08, 2009 at 11:50 UTC
I think alarm is working but you call close on WORKER when you catch the signal, put some prints before the close statements. from close Closing a pipe also waits for the process executing on the pipe to complete..	[reply]
Re^2: Cannot catch ALRM signal for timeout by Magice (Novice) on Dec 08, 2009 at 13:03 UTC
I have tried that also (print out some string in the if checking for timeout), but it does not print anything, so I believe that it does not catch the signal.	[reply]
Re^3: Cannot catch ALRM signal for timeout by Fox (Pilgrim) on Dec 08, 2009 at 13:18 UTC
`open(WORKER,"yes >/dev/null \|"); eval{ local $SIG{ALRM} = sub { die "alarm\n" }; alarm 1; $str = <WORKER>; alarm 0; }; if ($@) { print "timeout\n"; close WORKER; return 0; }` [download] for me, this prints "timeout" and hangs on close. what happens if you run this code ? UPDATE: the eof function also hangs on pipes.	[reply] [d/l]
Re^4: Cannot catch ALRM signal for timeout by Magice (Novice) on Dec 08, 2009 at 13:24 UTC
Re^5: Cannot catch ALRM signal for timeout by Fox (Pilgrim) on Dec 08, 2009 at 13:54 UTC
Re: Cannot catch ALRM signal for timeout by DrHyde (Prior) on Dec 09, 2009 at 10:56 UTC
I've noticed similar problems in the perl test suite when building perl on some platforms. See this post on perl5-porters, which never got a response. To summarise: possible bug in perl or OS bug, work-around it by inserting a micro-sleep using four-arg select() in your loop.	[reply]