comment on

Hi marioroy,

As you said, it's apples to oranges, so let's try to be more appleish (or is it orangeish?).

First attempt was to group job ids so a child does more than one during its lifetime. Turns out to be fairly simple.

Right after I posted Re: Parallel::ForkManager and multiple datasets, I realized I had written roughly the same forking code several times, so it was time to move it to a module.

Here's the module. It uses callbacks for the child code and for the parent code that processes the child's returned value.

package Forking::Amazing;

sub run ($&&@)
  {
  my ( $maxforks, $childcallback, $resultcallback, @ids ) = @_;

  use Storable qw( freeze thaw );
  use IO::Select;
  my %fh2id;
  my $sel = IO::Select->new;
  while( @ids or $sel->count ) # unstarted or active
    {
    while( @ids and $sel->count < $maxforks ) # start all forks allowe
+d
      {
      my $id = shift @ids;
      if( open my $fh, '-|' ) # forking open
        {
        $sel->add( $fh ); # parent
        $fh2id{$fh} = $id;
        }
      else # child code goes here
        {
        print freeze $childcallback->($id);
        exit;
        }
      }
    for my $fh ( $sel->can_read ) # collecting child data
      {
      $sel->remove( $fh );
      $resultcallback->($fh2id{$fh}, thaw do { local $/; <$fh> });
      }
    }
  }

1;

__END__

=head1 EXAMPLE program
  
 use Forking::Amazing;               # small example program
 use Data::Dump 'dd';
 Forking::Amazing::run(
   5,                                # max forks
   sub { +{id => pop, pid => $$} },  # runs in child
   sub {dd pop},                     # process result of child in pare
+nt
   'a'..'z');                        # ids (one fork for each id)

=cut
[download]

The module name may change in the future. :)

Here's code using that module that does grouping of job ids.
The id passed to the child is now an anon array of job ids, and a child now returns an anon array of results.

#!/usr/bin/perl

use strict;
use warnings;;
use Forking::Amazing;
use Data::Dump 'dd';
use Time::HiRes qw(time);

my $groupsize = 1000;
my @rawids = 'job0001' .. 'job9999';
my @ids;
push @ids, [ splice @rawids, 0, $groupsize ] while @rawids;

my @answers;
my $start = time;

Forking::Amazing::run
  20,
  sub { [ map +{id => $_, pid => $$, time => time - $start}, @{+shift}
+ ] },
  sub { push @answers, @{+pop} },
  @ids;

my $end = time - $start;
dd \@answers;
print "forking time $end\n";
[download]

It's a significant speed up :)

Note that I moved the dd out of the timing loop, since the dd takes over 1.5 seconds to run on my machine and swamps the forking time.

In reply to Re^3: Parallel::ForkManager and multiple datasets by tybalt89
in thread Parallel::ForkManager and multiple datasets by Speed_Freak

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.