hotel has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,

I am writing a script that creates an array with a size that ranges between 15 and 20, each time I run it. The number of elements in the array -usually- changes each time I run the script because its size depend on some calculations I make, which is not very important at this point.

I am using every element of this array to execute a child process (a procedure I am not really familiar with), as the following:

my $passed_dir = $_[0]; my $dirtoget="${passed_dir}/StretchingDecaAlanine/GMXCubicBox/Umbr +ella"; opendir(IMD, $dirtoget) || die("Cannot open directory"); my @files= readdir(IMD); closedir(IMD); foreach my $f (@files) { if ( (-d $f) and ($f ne ".") and ($f ne "..") ){ my $secdirtoget= "${passed_dir}/StretchingDecaAlanine/GMX +CubicBox/Umbrella/${f}"; chdir ($secdirtoget); system ("ln -s ../md_umbrella.mdp ../index.ndx ../topol +.top ../posre_CTerm.itp ../posre.itp ."); system ("grompp -f md_umbrella.mdp -c pullconf.gro -n i +ndex.ndx"); my $pid = fork(); if ($pid == -1) { die; } elsif ($pid == 0) { print "Running mdrun for within $f\n"; exec 'mdrun', '>logje 2>&1 &' or die; } while (wait() != -1) {} print "Done with mdrun in $f\n"; chdir ("${passed_dir}/StretchingDecaAlanine/GMXCubicBox/Um +brella"); }
Above you see that for each element in @files; 1- I chdir to a corresponding folder 2- do some linking, 3- Make a system call to a program 4- Execute a process (in elsif), wait for it to end 5- Return (last chdir) to the parent directory and do the same thing with the next folder (element) in my array @files

Now, the thing is.. The program I run with exec takes almost 45 minutes to finish for a single folder in @files. As I already mentioned I do this for at least 15, at most 18-19 folders (the size of @files at that particular run). The thing about this program I am executing (mdrun) is that you can execute as many as you want. So actually instead of doing 1 by one (waiting for each to finish), I can execute for the whole size without using fork(), but then the computer gets really slow.

The advice I want from you is to tell me a way to split this @files into arrays of smaller size (say 4), and then each time execute 4 mdruns, and wait for (maybe again using fork) these 4 to finish in order to continue with another 4. But as I already said, the size of this array @files changes in each run. So if it is of size 17, what I need to do is to split it into arrays of size 4+4+4+4+1.. If it is size 15, then 4+4+4+3. So that I can execute mdrun first with the array of size 4, then wait for those to finish, and then execute it with the following 4, then wait again, then execute with the next 4 elements, and then wait, and then execute with the last 3, and then wait again.

Any ideas? I hope my question was clear.. Many thanks..

PS: Some of you may say that I am using fork, execute and wait in a wrong/inappropriate way. Well, you most probably are right because I am not really familiar with these tools and I could not find a nice website that explains how to wait until a process (which is called within a Perl script) to finish before continuing with the rest of the script. I ripped the above fork- exec-wait off a website and right now it works for me. But if you can advice a better usage of course I can replace. What I want to do with the above fork-exec-wait is just to wait until "mdrun" process is finished (it does not return anything to script etc, it's like a mkdir command with only different being it takes 45 minutes to finish).

If the title of the post is not descriptive enough, or simply incorrect, please advice a better title for this problem so that I can change it.

Replies are listed 'Best First'.
Re: Splitting an array into subarrays of size ~n
by duelafn (Parson) on Mar 21, 2011 at 13:41 UTC

    See Parallel::ForkManager. As in their example:

    use Parallel::ForkManager; my $pm = new Parallel::ForkManager($MAX_PROCESSES); # ... foreach my $f (@files) { if ( (-d $f) and ($f ne ".") and ($f ne "..") ){ my $pid = $pm->start and next; chdir ... system ... system ... system ... $pm->finish; } } $pm->wait_all_children;

    Update Since ForkManager handles the forking, you will not need to fork/exec mdrun, a system will be fine.

    Update Added call to wait_all_children

    Good Day,
        Dean

Re: Splitting an array into subarrays of size ~n
by moritz (Cardinal) on Mar 21, 2011 at 13:47 UTC
Re: Splitting an array into subarrays of size ~n
by jethro (Monsignor) on Mar 21, 2011 at 13:56 UTC

    If you just want working code, I have the perfect solution for you: Parallel::ForkManager. Perfect fit, so to speak

    if you want to know more about fork and wait, just use "perldoc -f fork", "perldoc -f wait", "perldoc -f waitpid" and "perldoc perlipc". Or read the corresponding man pages of the underlying C library.

    Basically you should be able to do up to 4 forks, then just wait until you wait()ed an equal number of times (or better until wait() returns -1). Getting the first 4 entries of an array is easy, just use my @part= splice(@array,0,4). How much you really get out of the array (because it may have only 1, 2 or 3 entries left) you will find out with scalar(@part)

      Thanks for all of the answers.. I do not want to use Parallel::ForkManager because I am not quite sure how to ship it with my script (this is a homework, but do not worry, I am doing an "extra" feature in the homework so it is okay (I can run 20 children processes at the same time, no one cares, but i want to do ~4 by ~4 when possible).

      I am just not sure how to fork ~4 different processes. Below is the code I just wrote..
      while(scalar(@directories) > 0) { #Take first 4 when there are 4 or more elements in the array if(scalar(@directories)>=4){ @part= @directories[0..3]; } #Take as many as possible when <4 elements in the array elsif(scalar(@directories)<4){ @part = @directories [0..scalar(@directories)]; } #Shorten the array by the number of elements taken in @part for(my $i=0; $i<scalar(@part); $i++) { shift(@directories); } #For each element in @part, create children processes #************************************ foreach my $f (@part) { my $secdirtoget= "${passed_dir}/StretchingDecaAlanine/GMXCubic +Box/Umbrella/${f}"; chdir ($secdirtoget); system ("ln -s ../md_umbrella.mdp ../index.ndx ../topol.top .. +/posre_CTerm.itp ../posre.itp ."); system ("grompp -f md_umbrella.mdp -c pullconf.gro -n index.nd +x"); exec 'mdrun', '>logje 2>&1 &'; chdir ("${passed_dir}/StretchingDecaAlanine/GMXCubicBox/Umbrel +la"); } #************************************ }
      The thing is that I want my script to wait until 4 or less mdrun calls in the foreach my $f (@part) finish doing their job, and then continue with the next 4 or less elements when a new partition in the beginning of the while is made.. Many thanks..
        Hi again.. I guess I figured it out :-) The below seems to work:
        sub mdrunner { my $passed_dir = $_[0]; my $dirtoget="${passed_dir}/StretchingDecaAlanine/GMXCubicBox/Umbr +ella"; my (@directories, @part) = (); opendir(IMD, $dirtoget) || die("Cannot open directory"); my @files= readdir(IMD); closedir(IMD); foreach my $g (@files) { push(@directories, $g) if ((-d $g) and ($g ne ".") and ($g ne + "..")); } while(scalar(@directories) > 0) { #Take first 4 when there are 4 or more elements in the array if(scalar(@directories)>=4){ @part= @directories[0..3]; } #Take as many as possible when <4 elements in the array elsif(scalar(@directories)<4){ @part = @directories [0..scalar(@directories)]; } #Shorten the array by the number of elements taken in @part for(my $i=0; $i<scalar(@part); $i++) { shift(@directories); } #For each element in @part, create children processes foreach my $f (@part) { my $secdirtoget= "${passed_dir}/StretchingDecaAlanine/GMX +CubicBox/Umbrella/${f}"; chdir ($secdirtoget); system ("ln -s ../md_umbrella.mdp ../index.ndx ../topol +.top ../posre_CTerm.itp ../posre.itp ."); system ("grompp -f md_umbrella.mdp -c pullconf.gro -n i +ndex.ndx"); my $pid = fork(); if ($pid == -1) { die; } elsif ($pid == 0) { print "Running mdrun for within $f\n"; exec 'mdrun', '>logje 2>&1 &' or die; } chdir ("${passed_dir}/StretchingDecaAlanine/GMXCubicBox/Um +brella"); } while (wait() != -1) {} print "Done with mdrun in"."@part\n"; } }