john.goor has asked for the wisdom of the Perl Monks concerning the following question:

Hi All,
I'm getting desparate running either subroutines in parallel
or commands run on the commandline.
There's Parallel::Jobs, Parallel:ForkManager and another
one.
Point is: I want many jobs run in parallel (for indexing
purposes), example:
my @collections = qw (col1 col2 col3 col4 col5); my max_tasks = 3; foreach my $collection_alias (@collections) { start_parallel_task($collection_alias); } wait_until_all_processes_have_finished();

Also, all output from the sub-processed should write to
STDOUT & STDERR of the parent.

It seems that the different modules can accomplish about
90% of my wish, but lacks the other 10%.

Can s'one get me a code snippet, which does this exact
thing?
I have read and read and searched and searched and tried and
tried but did not succeed. Please don't shoot if I
read/searched/tried not hard enough. ;-)
On request I can cough-up some snippets that fail.
Thanks in advance, o, enlighted monks.
John

Replies are listed 'Best First'.
Re: Parallel tasks
by tachyon (Chancellor) on Jul 12, 2004 at 11:07 UTC
    Parallel::ForkManager does exactly that. It is clearly documented. What's the problem exactly? Kids share STDOUT/STDERR with parent unless you make it different. Are you trying to do system() in which case you want to use a pipe open, open2, open3, or backtics to fire up the external process in the kids so you can grab the output.....
    C:\>type test.pl use Parallel::ForkManager; $|++; # unbuffer output my @collections = qw (col1 col2 col3 col4 col5); my $max_tasks = 3; $pm = new Parallel::ForkManager($max_tasks); for my $collection (@collections) { my $pid = $pm->start and next; # do something with $collection in kid do{print "$_\t$collection\n"; sleep 1 } for 1..2; # kill kid $pm->finish; } C:\>perl test.pl > out C:\>type out 1 col1 1 col2 1 col3 2 col1 2 col2 2 col3 1 col4 1 col5 2 col4 2 col5 C:\>

    cheers

    tachyon

      That can't be right? All 3 proceesses are processing all 5 collections.

        Rubbish. Perhaps this slightly modified code will help you see what is happening more easily:

        C:\>type test.pl use Parallel::ForkManager; my @collections = qw (col1 col2 col3 col4 col5); my $max_tasks = 3; $pm = new Parallel::ForkManager($max_tasks); $|++; my $start = time(); for my $collection (@collections) { my $pid = $pm->start and next; printf "Begin processing $collection at %d secs.....\n", time()-$s +tart; sleep rand(5)+2; printf ".... $collection done at %d secs!\n", time()-$start; $pm->finish; } C:\>perl test.pl Begin processing col1 at 0 secs..... Begin processing col2 at 0 secs..... Begin processing col3 at 0 secs..... .... col3 done at 3 secs! Begin processing col4 at 3 secs..... .... col1 done at 4 secs! Begin processing col5 at 4 secs..... .... col2 done at 5 secs! .... col4 done at 6 secs! .... col5 done at 9 secs! C:\>

        cheers

        tachyon

      Indeed I tried that one, and you're right I should have been a
      bit more specific.
      It does exactly what I want it to do, at least the example
      code.
      The moment I start messing around by, say, adding a home-brew
      module (which contains really nothing special but some
      common stuff to be re-used by other scrips), it 'misbehaves'.
      This misbehaviour means that when I specify a processqueue
      of 10, it runs 10 processes, but does not respawn new ones.
      At the end just 10 processes have run and
      finished.
      The examplecode I use is mostly from the book, slightly customized:
      #perl.exe -w # INIT ---- use strict; use Parallel::ForkManager; use MyTest; # Mind this one, it's the one that messes things up # VARS ---- my $max_procs = 10; my $wait = 5; my @collections = qw( adw adw_wet_compleet ag-dou ag-gi ag-hc ag-hlo a +g-its-sol ag-its-sol-hot agents alg_diversen alg_dossier_juris_en_res +o alg_ellis alg_ellis_en alg_kb alg_kfb alg_kluwer_rest alg_pdf alg_r +est alg_sdu am b_adw b_adw_wet_compleet b_ag-dou b_ag-gi b_ag-hc b_ag +-hlo b_ag-its-sol b_ag-its-sol-hot b_agents b_alg_diversen b_alg_doss +ier_juris_en_reso b_alg_ellis b_alg_ellis_en b_alg_kb b_alg_kfb b_alg +_kluwer_rest b_alg_pdf b_alg_rest b_alg_sdu b_am b_bibliotheek b_bm b +_bna-mix b_browsedocs b_curbel b_fmod b_help b_hvg b_hvg-kluwer b_mrp + b_pz b_refdocs b_tss b_tss-pdf bibliotheek bm bna-mix browsedocs cu +rbel fmod help hvg hvg-kluwer images mrp pz refdocs tss tss-pdf ); # MAIN ---- my $pm = new Parallel::ForkManager($max_procs); # Setup a callback for when a child finishes up so we can # get it's exit code $pm->run_on_finish( sub { my ($pid, $exit_code, $ident) = @_; print "** $ident finished (PID: $pid) and exit code: $exit_cod +e\n"; } ); $pm->run_on_start( sub { my ($pid,$ident)=@_; print "** $ident started, pid: $pid\n"; } ); $pm->run_on_wait( sub { print "** waiting\n"; }, 0.5 ); foreach my $child ( 0 .. $#collections ) { my $pid = $pm->start($collections[$child]) and next; # This code is the child process print "Child: $collections[$child]\n"; sleep $wait; # End of child process $pm->finish($child); # pass an exit code to finish } print "Waiting for Children...\n"; $pm->wait_all_children; print "Everybody is out of the pool!\n";
      I pin-pointed the problem, try this for the 'MyTest' module...
      package MyTest; #--------------------------------------------------------------------- +--- # INIT #--------------------------------------------------------------------- +--- use strict; use Win32::OLE; use Win32::OLE::Variant; 1;

      This will show the problem as mentioned above.
      Needs some meditation, does it? :-)

        fork() on win32 is an emulation. It is known not to play well with Win32::OLE which tickles the bugs. There issue is the children segfaulting on exit.

        You may be able to get something working with threads, then again you may not. I suggest the easiest way would be to use Win32::Process to fire up your 10 processes. They will all have there own perl interpretter so there should be no issues porvided you have enough memory. All you have to do is create you initial pool and loop over them waiting for one to exit so you can spawn the next one.

        cheers

        tachyon

Re: Parallel tasks
by bageler (Hermit) on Jul 12, 2004 at 13:45 UTC
Re: Parallel tasks
by reyjrar (Hermit) on Jul 12, 2004 at 21:53 UTC
    I wrote this module: Parallel::ForkControl because I couldn't find any modules at the time that abstracted out the forking code in a way that let me adjust the forking performance so me and my co-workers could focus on the function of the code. I'd be interested in hearing your comments on the functionality of the module.
    use Parallel::ForkControl; my $forker = new Parallel::ForkControl( MaxKids => 5000, MinKids => 5, WatchLoad => 1, MaxLoad => 5.50, Code => \&mySub ); foreach my $col (@collections) { $forker->run($col); } $forker->cleanup(); sub mySub { my $collection = shift; ...... return; }

    Thanks,

    UPDATED: I fixed the code parameter in the example to be a code ref.

    -brad..
      I tried your example (see code below) but it didn't work.

      What am I doing wrong?

      #perl.exe -w # INIT ---- use Parallel::ForkControl; # VARS ---- my @collections = qw( adw adw_wet_compleet ag-dou ag-gi ag-hc ag-hlo a +g-its-sol ag-its-sol-hot agents alg_diversen alg_dossier_juris_en_res +o alg_ellis alg_ellis_en alg_kb alg_kfb alg_kluwer_rest alg_pdf alg_r +est alg_sdu am b_adw b_adw_wet_compleet b_ag-dou b_ag-gi b_ag-hc b_ag +-hlo b_ag-its-sol b_ag-its-sol-hot b_agents b_alg_diversen b_alg_doss +ier_juris_en_reso b_alg_ellis b_alg_ellis_en b_alg_kb b_alg_kfb b_alg +_kluwer_rest b_alg_pdf b_alg_rest b_alg_sdu b_am b_bibliotheek b_bm b +_bna-mix b_browsedocs b_curbel b_fmod b_help b_hvg b_hvg-kluwer b_mrp + b_pz b_refdocs b_tss b_tss-pdf bibliotheek bm bna-mix browsedocs cu +rbel fmod help hvg hvg-kluwer images mrp pz refdocs tss tss-pdf ); # SUBS ---- sub mySub { my $collection = shift; print "Col: $collection\n"; return; } # MAIN ---- my $forker = new Parallel::ForkControl( MaxKids => 5000, MinKids => 5, WatchLoad => 1, MaxLoad => 5.50, Code => &mySub ); foreach my $col (@collections) { $forker->run($col); } $forker->cleanup();

      The output I get is:
      Col: CANNOT RUN A IN RUN()
      My bad.. I forgot a \ try this:
      my $forker = new Parallel::ForkControl( MaxKids => 5000, MinKids => 5, WatchLoad => 1, MaxLoad => 5.50, Code => \&mySub );
      -brad..
Re: Parallel tasks
by BrowserUk (Patriarch) on Jul 13, 2004 at 09:23 UTC

    Seeing your having trouble with the forking, you could try this and see how it fairs. I've rather over-commented the code. HTH.

    #! perl -slw use strict; use threads qw[ yield ]; use Thread::Queue; ## The command to run (DIR for testing. my $COMMAND = 'dir /s'; ## FIFOs in and out my $Qwork = new Thread::Queue; my $Qresults = new Thread::Queue; ## Track our workers my $workers :shared = 0; sub worker { $workers++; ## Count 'em in threads->self->detach; ## No return ## Wait until we got something to do yield until $Qwork->pending; ## Whilst there is work while ( $Qwork->pending ) { ## Grab some my $task = $Qwork->dequeue; ## do it and get the output open my $in, "$COMMAND $task 2>&1 |" ## Maybe queue it back rather than yelling or warn "command $task failed: $!"; ## Grab the output and post it back to the main thread ## Prefix with the work item if it must be segregated $Qresults->enqueue( $_ ) while <$in>; } $workers--; ## and count 'em out again } ## Start a pool of workers threads->new( \&worker ) for 1 .. 3; ## Give 'em something to do $Qwork->enqueue( $_ ) for qw[ D: P: Q: T: U: V: W: Z: ]; ## Drive to d +ir ## wait for something to do yield until $Qresults->pending; ## Workers running while( $workers ) { ## get a result if available my $result = $Qresults->dequeue if $Qresults->pending; ## Do something with $result printf $result if defined $result; ## Give up the timeslice if workers are running ## but we have no results yet yield while $workers and not $Qresults->pending; } ## All done.

    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
    "Memory, processor, disk in that order on the hardware side. Algorithm, algoritm, algorithm on the code side." - tachyon