casimo has asked for the wisdom of the Perl Monks concerning the following question:

I am using Parallel::ForkManager to fork WWW:Mechanize to do some crawling. There are some instances where I want the fork to stop based on the content of the html response.

I am having trouble since the fork wants to keep going until all children have been spawned(?).

use WWW::Mechanize; use Parallel::ForkManager qw( ); use HTTP::Cookies; use constant MAX_CHILDREN => 3; { my $mech = WWW::Mechanize->new(timeout => 90); my @urllist = ( "http://site1/", "http://site2/", "http://site3/", "http://site4/", "http://site5/" #etc... ); my $pm = Parallel::ForkManager->new(MAX_CHILDREN); foreach my $url (@urllist) { # Forks and returns the pid for the child. my $pid = $pm->start() and next; $mech->get($url); $content=$mech->content; if ($content =~ m/string/) { #PROBLEM - exit the entire fork } # Exit child. $pm->finish(); } }

Replies are listed 'Best First'.
Re: Stop fork in Parallel::ForkManager
by BioLion (Curate) on Sep 25, 2009 at 16:52 UTC

    So you want to exit all children when you get the signal? Or just the child that recieved the signal?

    Finishing a child is easy enough - you can call the $pm->finish if (...signal...).

    To kill all children you'll have to keep a track of the currently active pids for the children and kill them yourself, Parallel::ForkManager doesn't have built in facilities for this ( as far as i am aware at least! ). Information on how to do this can be found in : Parallel::ForkManager question.

    I am not sure what the best way to set off the infanticide would be though, I guess having a subroutine which accesses the list of pids, but there might well be stability issues, i.e. the child calls the sub, which kills the child... so you would have to make sure it is the original parent process that does the killing, and i am not sure how you would do this.

    This way seems nasty though, and hopefully someone has a better idea!

    Just a something something...

      I also just found this :

      Which has a lot of the functionality you described - It has been very recently updated, and I haven't checked it out as yet, but certainly looks like a good alternative to Parallel::ForkManager when you want a little more control...
      HTH

      Just a something something...
Re: Stop fork in Parallel::ForkManager
by MidLifeXis (Monsignor) on Sep 25, 2009 at 16:49 UTC

    I can think of a few ways...

    • signals (kill parent, parent kills children, or kill process group)
    • shared memory
    • trigger file of some sort
    • some other IPC-type mechanism

    --MidLifeXis

    Please consider supporting my wife as she walks in the 2009 Alzheimer's Walk.

Re: Stop fork in Parallel::ForkManager
by MarkovChain (Sexton) on Sep 25, 2009 at 18:08 UTC
    If I were you, check out POE. It might not be solving the problem using ForkManager, but IMHO it would be a worthy route.
Re: Stop fork in Parallel::ForkManager
by sierpinski (Chaplain) on Sep 25, 2009 at 19:38 UTC
    What about emptying your @urllist so that no new processes get spawned, then a $pm->wait_all_children; to ensure that all of the children are finished. (Or *do* you just want to kill them all off?)

    Hopefully I'm not way off base on what you're wanting to do.
    /\ Sierpinski
Re: Stop fork in Parallel::ForkManager
by Urthas (Novice) on Sep 27, 2009 at 05:24 UTC

    I read the Synopsis and Description of Parallel::ForkManager because I was puzzled by the "and next" semantics in the call to start(). In the docs, it says, "The "and next" skips the internal loop in the parent process." Which internal loop is this? The surrounding foreach? Wouldn't that cause the rest of the foreach code to be skipped for the current iteration? In short, I'm still not clear on why the "and next" is required, and what it actually does. Would anyone be willing and able to comment on this? Thanks in advance.

      start/fork duplicates the current process, and both the original process (parent) and the copy (child) continue executing from that same point
      #!/usr/bin/perl -- use strict; use warnings; use Parallel::ForkManager; my $pm = new Parallel::ForkManager(3); for my $data ( 0 .. 2 ){ my $pid = $pm->start and next; ## WITH NEXT # my $pid = $pm->start ; ## WITHOUT NEXT print "data ($data) pid ($pid) \$\$($$)\n"; $pm->finish; # Terminates the child process } __END__
      The above program (with next) outputs
      data (0) pid (0) $$(-1792) data (1) pid (0) $$(-1856) data (2) pid (0) $$(-1916)
      The above program modified (without next) outputs
      data (0) pid (-1944) $$(1992) data (1) pid (-1656) $$(1992) data (1) pid (0) $$(-1656) data (0) pid (0) $$(-1944) data (2) pid (-1924) $$(1992) data (2) pid (0) $$(-1924)
      When pid is 0, it is the child process, when it is not zero, it is the parent process.

      Fork (operating system)

      Mr. Peabody Explains fork()

        You answered my question perfectly (I confess I'm new to the IPC game; actually I started reading that part of the Llama today). Good links. :)

        In regards to the original question...I found this code on wikipedia, and it seems to do what you want(?).

        #!/usr/bin/perl $pid = fork(); #Declare fork if ($pid == 0) { #Jump into the Child process for ($j = 0; $j < 10; $j++) { print "child: $j\n"; sleep 1; } exit(0); #Exit the fork [child process] } elsif ($pid > 0) { for ($i = 0; $i < 10; $i++) { print "parent: $i\n"; sleep 1; } exit(0); #Exit parent }

        So the thing here is to test the pid. In your code, you would say:

        $pm->finish if ($pid == 0 && $content =~ m/string/); #...no match, do more work... $pm->finish;

        Or am I totally missing the boat here (likely)? :)