gmol has asked for the wisdom of the Perl Monks concerning the following question:

Hi everyone, I'm using a perl script to oversee some plumbing of a set of related jobs that are being submitted to a PBS cluster. The issue is quite simple, but I am having difficulty figuring one the One True Way to do this... The program is simply splitting up an input file calling a binary on each partition, waits still each job has finished, pools the results and then uses those as input to the next cycle.
for($i=0; $i<5; $i++) { for($j=0; j<60; $j++) { $output[$j]=system("qsub $input[$j]"); } #wait until each of the previous jobs finishes @input=pool_output(@output); }
Now I can't figure out how to hook basically employ an observer pattern so that only after each qsub job has finished, do I pool output and then iterate, without polling. Remember that qsub returns almost immediately after submitting the job (the job itself may not have finished). Here are some of the ideas so far: using qsub -sync, which will block until the qsubbed job is done..ok...so start a bunch of submission jobs in parallel, so open a process file handle for each submission process or something? I can't figure out where to keep a counter so that the iteration is triggered only after each job has finished. qsub -m e to get an email when the job is done...so I write my own smptd....*shudder*...but this is the right concept of what I need. The newest version of PBS has something called "job arrays" that might be the answer, but unfortunately we don't have the newest version here. I would think that there is some sort of obvious aspect of unix IPC that I am missing, but I can't get a handle on this. Any help would be most appreciated.

Replies are listed 'Best First'.
Re: Parallel batch process plumbing issues
by secret (Beadle) on Dec 22, 2005 at 21:43 UTC

    Maybe make each job write a file when it is finished and watch the directory for all desired file to be there ?

    There is also Schedule::SGE but it is in version 0.02 ...

      Well ok, I think I have a simple solution. Just make a sub script that calls the binary, and writes to a file when it's done, checks to see if it's the last one to chip into the output pool, and if it is, exec the parent script again. A little ugly, but not kludgy enough to make me shudder... Sounds good, but I'll need to be careful about atomicity in checking/writing to files...any pointers would be apprecaited.
        Ah ok O'Riley Perl cookbook chap 7.11. I think my problem is solved.
Re: Parallel batch process plumbing issues
by explorer (Chaplain) on Dec 23, 2005 at 19:06 UTC
    CPAN have several modules to make parallel jobs. Goto Parallel::Simple module and read the end part to compare.