| [reply] |
I was wondering the same thing... Especially if you're not waiting for it to finish, how can you ever know if it is hung or not?
I've used Parallel::ForkManager in the past to spawn processes, but it also has a 'wait_all_children' function that will cause it to wait until all of the children processes are finished. In that case, you can have the child processes return a variable denoted that it completed successfully, errored, returned invalid results, etc.
Of course it depends on what you're trying to accomplish, but if you don't wait for the children to finish, I can't imagine that you'd know whether or not they ever really did finish.
| [reply] |
| [reply] |
avanta
I've not done it in perl before, but plenty of times in C/C++. In cases like this, I generally have a parent fork off children, one for each independent task. The parent (original) then monitors the status of the child process(es). Typically, I also have a few shared variables just for the children to advertise any interesting internal state that the parent may be interested in.
As I said, I've not done it in perl before, as none of my code has needed it so far. But IIRC, there are several thread/process management packages on CPAN that may be of some help to you.
...roboticus
| [reply] |
...Also, here our parent may have been completed.
You could have the children write various status info to log files in a directory, e.g. with the files containing the PID in their names. At a minimum, you'd need a heartbeat message (say a timestamp, printed every minute or so, as long as everything is running ok), and a "finished" status message.
Something else (e.g. a cronjob) could then periodically scan this directory, check the files for out-of-date heartbeat timestamps, not being followed by "finished" (which would indicate that the process hangs or somehow disappeared), etc., create a summary report, and do the cleanup of the files associated with finished jobs.
Of course, this would only work if the children's code is under your control, so you can modify it to print heartbeats, etc.
| [reply] |
What I would do:
- Define what "hung" means and how you can detect this (hard part).
- Write a nagios (or whatever monitoring tool you're using) script that checks for this condition (easy part).
| [reply] |