Hi marioroy,

As you said, it's apples to oranges, so let's try to be more appleish (or is it orangeish?).

First attempt was to group job ids so a child does more than one during its lifetime. Turns out to be fairly simple.

Right after I posted Re: Parallel::ForkManager and multiple datasets, I realized I had written roughly the same forking code several times, so it was time to move it to a module.

Here's the module. It uses callbacks for the child code and for the parent code that processes the child's returned value.

package Forking::Amazing; sub run ($&&@) { my ( $maxforks, $childcallback, $resultcallback, @ids ) = @_; use Storable qw( freeze thaw ); use IO::Select; my %fh2id; my $sel = IO::Select->new; while( @ids or $sel->count ) # unstarted or active { while( @ids and $sel->count < $maxforks ) # start all forks allowe +d { my $id = shift @ids; if( open my $fh, '-|' ) # forking open { $sel->add( $fh ); # parent $fh2id{$fh} = $id; } else # child code goes here { print freeze $childcallback->($id); exit; } } for my $fh ( $sel->can_read ) # collecting child data { $sel->remove( $fh ); $resultcallback->($fh2id{$fh}, thaw do { local $/; <$fh> }); } } } 1; __END__ =head1 EXAMPLE program use Forking::Amazing; # small example program use Data::Dump 'dd'; Forking::Amazing::run( 5, # max forks sub { +{id => pop, pid => $$} }, # runs in child sub {dd pop}, # process result of child in pare +nt 'a'..'z'); # ids (one fork for each id) =cut

The module name may change in the future. :)

Here's code using that module that does grouping of job ids.
The id passed to the child is now an anon array of job ids, and a child now returns an anon array of results.

#!/usr/bin/perl use strict; use warnings;; use Forking::Amazing; use Data::Dump 'dd'; use Time::HiRes qw(time); my $groupsize = 1000; my @rawids = 'job0001' .. 'job9999'; my @ids; push @ids, [ splice @rawids, 0, $groupsize ] while @rawids; my @answers; my $start = time; Forking::Amazing::run 20, sub { [ map +{id => $_, pid => $$, time => time - $start}, @{+shift} + ] }, sub { push @answers, @{+pop} }, @ids; my $end = time - $start; dd \@answers; print "forking time $end\n";

It's a significant speed up :)

Note that I moved the dd out of the timing loop, since the dd takes over 1.5 seconds to run on my machine and swamps the forking time.


In reply to Re^3: Parallel::ForkManager and multiple datasets by tybalt89
in thread Parallel::ForkManager and multiple datasets by Speed_Freak

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.