Hi marioroy,
As you said, it's apples to oranges, so let's try to be more appleish (or is it orangeish?).
First attempt was to group job ids so a child does more than one during its lifetime. Turns out to be fairly simple.
Right after I posted Re: Parallel::ForkManager and multiple datasets, I realized I had written roughly the same forking code several times, so it was time
to move it to a module.
Here's the module. It uses callbacks for the child code and for the parent code that processes the child's returned value.
package Forking::Amazing; sub run ($&&@) { my ( $maxforks, $childcallback, $resultcallback, @ids ) = @_; use Storable qw( freeze thaw ); use IO::Select; my %fh2id; my $sel = IO::Select->new; while( @ids or $sel->count ) # unstarted or active { while( @ids and $sel->count < $maxforks ) # start all forks allowe +d { my $id = shift @ids; if( open my $fh, '-|' ) # forking open { $sel->add( $fh ); # parent $fh2id{$fh} = $id; } else # child code goes here { print freeze $childcallback->($id); exit; } } for my $fh ( $sel->can_read ) # collecting child data { $sel->remove( $fh ); $resultcallback->($fh2id{$fh}, thaw do { local $/; <$fh> }); } } } 1; __END__ =head1 EXAMPLE program use Forking::Amazing; # small example program use Data::Dump 'dd'; Forking::Amazing::run( 5, # max forks sub { +{id => pop, pid => $$} }, # runs in child sub {dd pop}, # process result of child in pare +nt 'a'..'z'); # ids (one fork for each id) =cut
The module name may change in the future. :)
Here's code using that module that does grouping of job ids.
The id passed to the child is now an anon array of job ids, and a child now returns
an anon array of results.
#!/usr/bin/perl use strict; use warnings;; use Forking::Amazing; use Data::Dump 'dd'; use Time::HiRes qw(time); my $groupsize = 1000; my @rawids = 'job0001' .. 'job9999'; my @ids; push @ids, [ splice @rawids, 0, $groupsize ] while @rawids; my @answers; my $start = time; Forking::Amazing::run 20, sub { [ map +{id => $_, pid => $$, time => time - $start}, @{+shift} + ] }, sub { push @answers, @{+pop} }, @ids; my $end = time - $start; dd \@answers; print "forking time $end\n";
It's a significant speed up :)
Note that I moved the dd out of the timing loop, since the dd takes over 1.5 seconds to run on my machine and swamps the forking time.
In reply to Re^3: Parallel::ForkManager and multiple datasets
by tybalt89
in thread Parallel::ForkManager and multiple datasets
by Speed_Freak
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |