Re^3: double fork trick vs sig chld wait

This script dispatches jobs into a grid platform.

Does the script wait for the job to complete, or just for the submission to complete?

If the script isn't waiting for the job to complete, you shouldn't need to fork children. I've written a script to submit multiple jobs to Grid Engine by constructing a 'qsub' command and executing that via system, and the qsub command completed quickly enough that there wasn't a need to fork.

If the grid system you're working with supports DRMAA you might want to look into Schedule::DRMAAc.

Comment on Re^3: double fork trick vs sig chld wait

Replies are listed 'Best First'.
Re^4: double fork trick vs sig chld wait by Voronich (Hermit) on Nov 19, 2010 at 20:18 UTC
The script waits for the executed command to complete (which blocks while waiting for the grid job to finish.) So it's semantically equivalent (at least at this level) with "run something and grab it's output and return code when it finishes." The voodoo of the control process that this script is executing and it's relationship to the grid service is entirely blackbox to the dispatcher. The other wrinkle is that there are occasional business-level failures of the jobs. I'm not expected to clean up any more than I can from those. But I don't want to take the thread pool route because of the possibility of polluting the dispatch loop. The more articles and chapters of books I read the more it seems like the `$SIG{CHLD} = sub {$zombies++;}` then add a reaper function to the while(1) loop, per Camel chapter 16 is the way to go. The docs to wait and waitpid are the kind of things you (read: I) have to read 20 or 30 times, saying "wait, what?" each time, then go to lunch while it soaks in. http://www.mpwilson.com/uccu/	[reply] [d/l]

Replies are listed 'Best First'.

Re^4: double fork trick vs sig chld wait
by Voronich (Hermit) on Nov 19, 2010 at 20:18 UTC

The script waits for the executed command to complete (which blocks while waiting for the grid job to finish.)

So it's semantically equivalent (at least at this level) with "run something and grab it's output and return code when it finishes."

The voodoo of the control process that this script is executing and it's relationship to the grid service is entirely blackbox to the dispatcher.

The other wrinkle is that there are occasional business-level failures of the jobs. I'm not expected to clean up any more than I can from those. But I don't want to take the thread pool route because of the possibility of polluting the dispatch loop.

The more articles and chapters of books I read the more it seems like the $SIG{CHLD} = sub {$zombies++;} then add a reaper function to the while(1) loop, per Camel chapter 16 is the way to go.

The docs to wait and waitpid are the kind of things you (read: I) have to read 20 or 30 times, saying "wait, what?" each time, then go to lunch while it soaks in.

http://www.mpwilson.com/uccu/

[reply]
[d/l]