Magice has asked for the wisdom of the Perl Monks concerning the following question:

Hello! I am trying to write a script which reads from STDOUT of a worker program with timeout so that if the worker takes too long, the script can kill it and retry. However, for some reasons, the ALRM signal does not seem to be caught. I have dug up perlfunc and google, and literally copy-and-paste the eval block, which still does not work. Here is the code:
sub process_scheme{ my ($output_file) = @_; open (OUTPUT_FILE, ">", $output_file) || die "cannot open lo +g file"; my $scheme_pid = open (WORKER, "worker_program < $input |") || die "cannot pipe worker"; my $str; while (1) { if (eof(WORKER)){ close WORKER; close OUTPUT; return 1; } eval{ local $SIG{ALRM} = sub { die "alarm\n" }; # NB: \n r +equired alarm 1; $str = <WORKER>; alarm 0; # if this line is removed, the signal is ca +ught, but outside of eval, so it halts the script }; if ($@) { close OUTPUT_FILE; close WORKER; return 0; } # process the line # some code ... # print the line to log print OUTPUT_FILE $str; } }
Can some monks please help me? Thank you!

Replies are listed 'Best First'.
Re: Cannot catch ALRM signal for timeout
by almut (Canon) on Dec 08, 2009 at 14:09 UTC

    I can replicate the problem with the following little test script:

    #!/usr/bin/perl sub process_scheme{ my ($cmd, $output_file) = @_; open (OUTPUT_FILE, ">", $output_file) || die "cannot open log file +"; my $scheme_pid = open (WORKER, "$cmd |") || die "cannot pipe worker"; my $str; while (1) { if (eof(WORKER)){ close WORKER; close OUTPUT; return 1; } eval{ local $SIG{ALRM} = sub { die "alarm\n" }; # NB: \n require +d alarm 1; $str = <WORKER>; alarm 0; }; if ($@) { close OUTPUT_FILE; close WORKER; return 0; } print OUTPUT_FILE $str; } } my $code = '$|=1; my $t=2; for (1..100) { printf "%.1f\n", $_*$t; sele +ct undef,undef,undef,$t; }'; my $cmd = "perl -e '$code'"; process_scheme($cmd, "test.out");

    The command being run prints a line every two seconds, so with a timeout of one second, you'd expect the timeout to occurr  (it doesn't, however...)

    When you run it under strace, i.e.

    $ strace -tttf ./811706.pl

    the following repeating pattern can be observed:

    ... [pid 7341] 1260280139.166408 write(1, "2.0\n", 4 <unfinished ...> [pid 7340] 1260280139.166437 <... read resumed> "2.0\n", 4096) = 4 [pid 7340] 1260280139.166504 rt_sigaction(SIGALRM, NULL, {SIG_DFL}, 8 +) = 0 [pid 7340] 1260280139.166564 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) + = 0 [pid 7340] 1260280139.166614 rt_sigaction(SIGALRM, {SIG_DFL}, {SIG_DF +L}, 8) = 0 [pid 7340] 1260280139.166648 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) + = 0 [pid 7340] 1260280139.166678 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) + = 0 [pid 7340] 1260280139.166708 rt_sigaction(SIGALRM, {0x46ed80, [], SA_ +RESTORER, 0x2ac53d237140}, {SIG_DFL}, 8) = 0 [pid 7340] 1260280139.166744 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) + = 0 [pid 7340] 1260280139.166774 alarm(1) = 0 [pid 7340] 1260280139.166814 alarm(0) = 1 [pid 7340] 1260280139.166844 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) + = 0 [pid 7340] 1260280139.166875 rt_sigaction(SIGALRM, {SIG_DFL}, {0x46ed +80, [], SA_RESTORER, 0x2ac53d237140}, 8) = 0 [pid 7340] 1260280139.166910 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) + = 0 [pid 7340] 1260280139.166950 read(6, <unfinished ...> [pid 7341] 1260280139.166965 <... write resumed> ) = 4 [pid 7341] 1260280139.166990 select(0, NULL, NULL, NULL, {2, 0}) = 0 +(Timeout) [pid 7341] 1260280141.168299 write(1, "4.0\n", 4) = 4 [pid 7341] 1260280141.168344 select(0, NULL, NULL, NULL, {2, 0} <unfi +nished ...> [pid 7340] 1260280141.168370 <... read resumed> "4.0\n", 4096) = 4 [pid 7340] 1260280141.168413 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) + = 0 [pid 7340] 1260280141.168461 rt_sigaction(SIGALRM, {SIG_DFL}, {SIG_DF +L}, 8) = 0 [pid 7340] 1260280141.168501 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) + = 0 [pid 7340] 1260280141.168532 rt_sigprocmask(SIG_BLOCK, [ALRM], [], 8) + = 0 [pid 7340] 1260280141.168566 rt_sigaction(SIGALRM, {0x46ed80, [], SA_ +RESTORER, 0x2ac53d237140}, {SIG_DFL}, 8) = 0 [pid 7340] 1260280141.168611 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) + = 0 [pid 7340] 1260280141.168640 alarm(1) = 0 [pid 7340] 1260280141.168675 alarm(0) = 1 ...

    In particular, note that the time between alarm(1) and alarm(0) is only around 40 microseconds, with no read system call in between.

    Presumably, the reason for this is that reading is buffered, i.e. that Perl's $str = <WORKER> does not directly map to a system read. In other words, what happens in between the alarm system calls is that Perl reads from its internal buffer (which takes almost no time (and is invisible in the trace)), the alarm is being reset, and then the system read is called to fill the buffer again...

    The theory is comfirmed by using sysread WORKER, $str, 4 instead, in which case the timeout works as expected.

      and then the system read is called to fill the buffer again...

      You make it sound like Perl launches a thread to fill the buffer in the background. In reality, it's eof that reads from the file.

      Move the eof into the eval and the timeout will take effect.

        You make it sound like Perl launches a thread to fill the buffer

        That's not what I meant.  Perl tries to fill the buffer whenever it's empty and something is attempting to read from it — which in this case is the eof, as you note.

        ikegami, what do you mean by "move the eof into the eval"? I try this:
        sub process_scheme{ my ($output_file) = @_; open (OUTPUT_FILE, ">", $output_file) || die "cannot open lo +g file"; my $scheme_pid = open (WORKER, "worker_program < $input |") || die "cannot pipe worker"; my $str; while (1) { eval{ local $SIG{ALRM} = sub { die "alarm\n" }; # NB: \n r +equired alarm 1; if (eof(WORKER)){ close WORKER; close OUTPUT; return 1; } $str = <WORKER>; alarm 0; # if this line is removed, the signal is ca +ught, but outside of eval, so it halts the script }; if ($@) { close OUTPUT_FILE; close WORKER; return 0; } # process the line # some code ... # print the line to log print OUTPUT_FILE $str; } }
        but it still does not work :( Thank you for your hint, tho
      Ouch! That explains it. Thank you very much, almut!!!
Re: Cannot catch ALRM signal for timeout
by almut (Canon) on Dec 08, 2009 at 11:40 UTC
    ...so that if the worker takes too long

    What exactly is taking too long? I.e., does the worker program stop delivering output, so that the $str = <WORKER> blocks, or does it just keep outputting lines for too long?  In the latter case, I wouldn't expect the alarm to work, because the alarm is always being reset immediately after the read (which takes almost no time if it succeeds)...

    An alternative to the alarm approach would be to fork and have the child kill the external process after a while.  (I'll post an example later (unless someone else has already done so by then) — it's lunch-time here...)

      FWIW, here's the promised example, i.e. the alternative approach using a separate "watch" process to handle the killing:

      #!/usr/bin/perl sub run_with_timeout { my ($cmd, $timeout, $outref) = @_; my $rc = -1; # return code in case of timeout if (my $pid = open(my $pipe, "-|")) { my $pid2 = fork(); die "cannot fork" unless defined $pid; if ($pid2) { @$outref = <$pipe>; close $pipe; # does an implicit wait, returns status + in $? my $normalexit = ($? & 0x7f)==0; $rc = $? >> 8 if $normalexit; kill 'INT', $pid2 if $normalexit; # watcher no longer nee +ded (cmd finished) wait; return $rc; } else { select undef,undef,undef,$timeout; kill 'INT', $pid; # kill command if still there exit; } } else { exec "exec $cmd 2>&1"; # exec, so $cmd runs under $pid } } my $cmd = <<'EOCMD'; perl -e ' $|=1; my $t=0.2; for (0..10) { printf "%.1f\n", $_*$t; select undef,undef,undef, $t; } exit 42;'; EOCMD my $timeout = 1.5; my $output = []; my $ret = run_with_timeout($cmd, $timeout, $output); printf "return code: %d%s\n", $ret, $ret == -1 ? ' (timed out)':''; print "output of command:\n",@$output; __END__ return code: -1 (timed out) output of command: 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4
      By "taking too long", I mean if the worker stops delivering output. My idea is to wait for the output of the worker, and if the worker does not write/print anything in a while, then reset it. Thank you!
Re: Cannot catch ALRM signal for timeout
by Fox (Pilgrim) on Dec 08, 2009 at 11:50 UTC
    I think alarm is working but you call close on WORKER when you catch the signal, put some prints before the close statements.

    from close
    Closing a pipe also waits for the process executing on the pipe to complete..
      I have tried that also (print out some string in the if checking for timeout), but it does not print anything, so I believe that it does not catch the signal.
        open(WORKER,"yes >/dev/null |"); eval{ local $SIG{ALRM} = sub { die "alarm\n" }; alarm 1; $str = <WORKER>; alarm 0; }; if ($@) { print "timeout\n"; close WORKER; return 0; }
        for me, this prints "timeout" and hangs on close.
        what happens if you run this code ?

        UPDATE: the eof function also hangs on pipes.
Re: Cannot catch ALRM signal for timeout
by DrHyde (Prior) on Dec 09, 2009 at 10:56 UTC

    I've noticed similar problems in the perl test suite when building perl on *some* platforms. See this post on perl5-porters, which never got a response.

    To summarise: possible bug in perl or OS bug, work-around it by inserting a micro-sleep using four-arg select() in your loop.