Takamoto has asked for the wisdom of the Perl Monks concerning the following question:

This is my first attempt to process things in parallel with Parallel::ForkManager on a server. I have several subrutines to collect data through APIs. I want to perform it in parallel and then merge the results. This is my script, not elegant of course, but it runs. As I do not see a huge difference in performance (time) in running things in parallel with this script or running the single subrutines one after the other (the script let me save ~1/3 of the time), just wanted to ask for your wisdom about my script

use Parallel::ForkManager; my $max_procs = 6; my @names = qw( 0 2 3 4 5 0 ); my @DataStructure; my $pm = Parallel::ForkManager->new($max_procs, @ARGV); $pm->run_on_finish( sub { my ($pid, $exit_code, $ident, $exit_signal, $core_dump, $data_struct +ure_reference) = @_; my @results= @$data_structure_reference; if (@results){ push (@DataStructure, @results); } }); foreach my $child ( 0 .. $#names ) { my $pid = $pm->start($names[$child]) and next; my @results; if ($child eq 1){ @results=getResultsAPI_1(); } elsif ($child eq 2){ @results=getResultsAPI_2(); } elsif ($child eq 3){ @results=getResultsAPI_3(); } elsif ($child eq 4){ @results=getResultsAPI_4(); } elsif ($child eq 5){ @results=getResultsAPI_5(); } elsif ($child eq 6){ @results=getResultsAPI_6(); } $pm->finish($child, \@results); } $pm->wait_all_children;

Replies are listed 'Best First'.
Re: Parallel::ForkManager right approach
by roboticus (Chancellor) on Oct 13, 2019 at 17:59 UTC

    Takamoto:

    To test your code, I added this to the end:

    sub tprint { my $t = time; print "$t: ", shift, "\n"; } sub getResultsAPI { my ($name, $doze_time) = @_; tprint "API_$name: dozing for ${doze_time}s"; sleep $doze_time; tprint "API_$name: done"; } sub getResultsAPI_1 { getResultsAPI(1, 12) } sub getResultsAPI_2 { getResultsAPI(2, 22) } sub getResultsAPI_3 { getResultsAPI(3, 16) } sub getResultsAPI_4 { getResultsAPI(4, 5) } sub getResultsAPI_5 { getResultsAPI(5, 11) } sub getResultsAPI_6 { getResultsAPI(6, 7) }

    And it seemed to do what you'd expect:

    $ perl pm_11107400.pl 1570989189: API_1: dozing for 12s 1570989189: API_2: dozing for 22s 1570989189: API_3: dozing for 16s 1570989189: API_4: dozing for 5s 1570989189: API_5: dozing for 11s 1570989194: API_4: done 1570989200: API_5: done 1570989201: API_1: done 1570989205: API_3: done 1570989211: API_2: done

    Of course, if your APIs contend with each other for resources (such as hitting the hard drive a lot), you may not save much time. If they run well in parallel as the simple sleep timers do in the case I tried, you can save much more. Make sure you look at what your different APIs do to see if they conflict with each other. Bundling IO-heavy operations with CPU-heavy operations tends to work well to save you time. Similarly, you can *lose* time if you have processes always contending for the same resources.

    Note that you have a minor bug, though, in that you never run API_6. I'm assuming the bug was introduced by your simplification of the test code.

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

Re: Parallel::ForkManager right approach
by davido (Cardinal) on Oct 14, 2019 at 05:09 UTC

    Are forks needed for this, or could you be making your API requests using a non-blocking user agent?

    Forks have their own overhead, and when you make an HTTP request against an API, most of your time is spent just sitting around waiting for the response -- you're not CPU bound. If this describes your situation, have a look at Mojo::UserAgent, Mojo::IOLoop, and Mojo::Promise. There are other options: AnyEvent::UserAgent, for example, but Mojo::UserAgent makes such tidy work of the problem, it's a really nice option.

    Of course I don't know what type of API you're hitting. Hopefully this suggestion is useful in your case. Example:

    my @promises = map {$ua->get_p($_)} @api_points; Mojo::Promise->all(@promises)->then(sub { my (@results) = @_; print Dumper $_->[0]->{'result'}->json foreach @results; });

    If you have six @api_points this will make six GET requests in rapid fire and then process the results when they all return. There's a little code that I didn't show that surrounds this; using Mojo::UserAgent, getting a $ua object, kicking off the event loop if it hasn't started, that sort of thing. But it's all documented in the links above.


    Dave

Re: Parallel::ForkManager right approach
by karlgoethebier (Abbot) on Oct 14, 2019 at 15:49 UTC

    See also. Regards, Karl

    «The Crux of the Biscuit is the Apostrophe»

    perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'Help