in reply to Cannot catch ALRM signal for timeout

I can replicate the problem with the following little test script:

#!/usr/bin/perl sub process_scheme{ my ($cmd, $output_file) = @_; open (OUTPUT_FILE, ">", $output_file) || die "cannot open log file +"; my $scheme_pid = open (WORKER, "$cmd |") || die "cannot pipe worker"; my $str; while (1) { if (eof(WORKER)){ close WORKER; close OUTPUT; return 1; } eval{ local $SIG{ALRM} = sub { die "alarm\n" }; # NB: \n require +d alarm 1; $str = <WORKER>; alarm 0; }; if ($@) { close OUTPUT_FILE; close WORKER; return 0; } print OUTPUT_FILE $str; } } my $code = '$|=1; my $t=2; for (1..100) { printf "%.1f\n", $_*$t; sele +ct undef,undef,undef,$t; }'; my $cmd = "perl -e '$code'"; process_scheme($cmd, "test.out");

The command being run prints a line every two seconds, so with a timeout of one second, you'd expect the timeout to occurr  (it doesn't, however...)

When you run it under strace, i.e.

$ strace -tttf ./811706.pl

the following repeating pattern can be observed:

... [pid 7341] 1260280139.166408 write(1, "2.0\n", 4 <unfinished ...> [pid 7340] 1260280139.166437 <... read resumed> "2.0\n", 4096) = 4 [pid 7340] 1260280139.166504 rt_sigaction(SIGALRM, NULL, {SIG_DFL}, 8 +) = 0 [pid 7340] 1260280139.166564 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) + = 0 [pid 7340] 1260280139.166614 rt_sigaction(SIGALRM, {SIG_DFL}, {SIG_DF +L}, 8) = 0 [pid 7340] 1260280139.166648 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) + = 0 [pid 7340] 1260280139.166678 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) + = 0 [pid 7340] 1260280139.166708 rt_sigaction(SIGALRM, {0x46ed80, [], SA_ +RESTORER, 0x2ac53d237140}, {SIG_DFL}, 8) = 0 [pid 7340] 1260280139.166744 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) + = 0 [pid 7340] 1260280139.166774 alarm(1) = 0 [pid 7340] 1260280139.166814 alarm(0) = 1 [pid 7340] 1260280139.166844 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) + = 0 [pid 7340] 1260280139.166875 rt_sigaction(SIGALRM, {SIG_DFL}, {0x46ed +80, [], SA_RESTORER, 0x2ac53d237140}, 8) = 0 [pid 7340] 1260280139.166910 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) + = 0 [pid 7340] 1260280139.166950 read(6, <unfinished ...> [pid 7341] 1260280139.166965 <... write resumed> ) = 4 [pid 7341] 1260280139.166990 select(0, NULL, NULL, NULL, {2, 0}) = 0 +(Timeout) [pid 7341] 1260280141.168299 write(1, "4.0\n", 4) = 4 [pid 7341] 1260280141.168344 select(0, NULL, NULL, NULL, {2, 0} <unfi +nished ...> [pid 7340] 1260280141.168370 <... read resumed> "4.0\n", 4096) = 4 [pid 7340] 1260280141.168413 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) + = 0 [pid 7340] 1260280141.168461 rt_sigaction(SIGALRM, {SIG_DFL}, {SIG_DF +L}, 8) = 0 [pid 7340] 1260280141.168501 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) + = 0 [pid 7340] 1260280141.168532 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) + = 0 [pid 7340] 1260280141.168566 rt_sigaction(SIGALRM, {0x46ed80, [], SA_ +RESTORER, 0x2ac53d237140}, {SIG_DFL}, 8) = 0 [pid 7340] 1260280141.168611 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) + = 0 [pid 7340] 1260280141.168640 alarm(1) = 0 [pid 7340] 1260280141.168675 alarm(0) = 1 ...

In particular, note that the time between alarm(1) and alarm(0) is only around 40 microseconds, with no read system call in between.

Presumably, the reason for this is that reading is buffered, i.e. that Perl's $str = <WORKER> does not directly map to a system read. In other words, what happens in between the alarm system calls is that Perl reads from its internal buffer (which takes almost no time (and is invisible in the trace)), the alarm is being reset, and then the system read is called to fill the buffer again...

The theory is comfirmed by using sysread WORKER, $str, 4 instead, in which case the timeout works as expected.

Replies are listed 'Best First'.
Re^2: Cannot catch ALRM signal for timeout
by ikegami (Patriarch) on Dec 08, 2009 at 15:42 UTC

    and then the system read is called to fill the buffer again...

    You make it sound like Perl launches a thread to fill the buffer in the background. In reality, it's eof that reads from the file.

    Move the eof into the eval and the timeout will take effect.

      You make it sound like Perl launches a thread to fill the buffer

      That's not what I meant.  Perl tries to fill the buffer whenever it's empty and something is attempting to read from it — which in this case is the eof, as you note.

      ikegami, what do you mean by "move the eof into the eval"? I try this:
      sub process_scheme{ my ($output_file) = @_; open (OUTPUT_FILE, ">", $output_file) || die "cannot open lo +g file"; my $scheme_pid = open (WORKER, "worker_program < $input |") || die "cannot pipe worker"; my $str; while (1) { eval{ local $SIG{ALRM} = sub { die "alarm\n" }; # NB: \n r +equired alarm 1; if (eof(WORKER)){ close WORKER; close OUTPUT; return 1; } $str = <WORKER>; alarm 0; # if this line is removed, the signal is ca +ught, but outside of eval, so it halts the script }; if ($@) { close OUTPUT_FILE; close WORKER; return 0; } # process the line # some code ... # print the line to log print OUTPUT_FILE $str; } }
      but it still does not work :( Thank you for your hint, tho

        but it still does not work

        What doesn't work?

        The timeout works fine. I changed

        my $scheme_pid = open (WORKER, "worker_program < $input |")
        to
        my $scheme_pid = open (WORKER, "-|", $^X, '-e', '$|=1; for (1..100) { +print "%d\n", $_*$t; sleep(2) }')

        and the timeout triggered. The log file contained only "0".

        You'll need to fix the return 1, though. That will only return out of the eval, not out of the sub.

Re^2: Cannot catch ALRM signal for timeout
by Magice (Novice) on Dec 08, 2009 at 14:56 UTC
    Ouch! That explains it. Thank you very much, almut!!!