comment on

Okay, I have here stub code for a parallel "threadedreduce", using what I think is pretty much functional style programming.

My goal is along the same lines as my post yesterday concerning concating letters (mentioned in op), but more honed. Basically, I have a slow function for demo purposes, and I want to speed it up by using code executed in parallel; but in such a way that I can abstract the ugly bigs (locks, threads, etcetera) into a module that I maintain separately.

So, here, my function in in the coderef: $slow_matches_b It's slow because it sleeps for one second. And what it does it, it returns true if the input matches the letter b.

I then have sub slowgrep, which should do the same thing that grep does, but in a more explicit way. slowgrep is built on sub reduce. This function is similar to the reduce you get in List::Util, but like Frank Sinatra, here, I did it my way. Could be it's not quite the same. I actually modeled it after the javascript reduce example code in the joelonsoftware essay Can Your Programming Language Do This?.

Reduce is one of the basic building blocks of functional programming, and half of the google's celebrated MapReduce. This particular implementation of Reduce executes serially.

Then I have the sub fastgrep, which should also do the same thing that grep does, but be faster because it uses some form of parallelization. Every line of this function is the same as slowgrep, except it is based on threadedreduce instead of reduce.

threadedreduce should do the same thing as reduce except... it doesn't. As the tests show. In fact, it is just a stub, and actually it has forks not threads, but you get the idea. I think this function could be implemented either with forks, or with threads, or with POE, or whatever your favorite technique for cranking up parallelization in your environment. The point is that it lets programmers process lists in a parallel way while hiding away the complexity of the parallelized code.

This is where I am hoping the monastery can step in and help me out. Can someone out there make threadedreduce work? threadedreduce could then be used to build threadedgrep, threadedUserAgent (same as UserAgent::Parallel, but you can WWW::Mechanize as the UA instead of the UA inherited from LWP::UserAgent)... or... well... lots of things. And in a way that's beautiful, functional, and maintainable.

 
$ ./test_reduce.pl
not ok 1 - parallel-y executing code works
#   Failed test 'parallel-y executing code works'
#   in ./test_reduce.pl at line 15.
fast matches: $VAR1 = [];
ok 2 - serially executing code works
slow matches: $VAR1 = [
          'blee',
          'blah',
          'bloo'
        ];
1..2
# Looks like you failed 1 test of 2.
$ cat test_reduce.pl
#!/usr/bin/perl
use strict;
use warnings;
use Test::More qw( no_plan );
use Data::Dumper;
use Parallel::ForkManager;

my $slow_matches_b = sub { sleep 1; return 1 if $_[0] =~ /b/; };
my $test_strings = [ ('blee','blah','bloo', 'qoo', 'fwee' ) ];

my $fast_matches = fastgrep( $slow_matches_b, $test_strings );
ok( @$fast_matches, "parallel-y executing code works"  );
print "fast matches: " . Dumper($fast_matches);
# should dump out blee, blah bloo, but not fwee or qoo

my $slow_matches = slowgrep( $slow_matches_b, $test_strings );
ok( @$slow_matches, "serially executing code works"  );
print "slow matches: " . Dumper($slow_matches);

sub fastgrep {
  my $test_function = shift;
  my $array = shift;

  my $grep_builder = sub {
    my $matches = shift;
    my $test_el = shift;
    push @$matches, $test_el if $test_function->($test_el);
    return $matches;
  };
  return threadedreduce ( $grep_builder, $array, [])
}

sub slowgrep {
  my $test_function = shift;
  my $array = shift;

  my $grep_builder = sub {
    my $matches = shift;
    my $test_el = shift;
    push @$matches, $test_el if $test_function->($test_el);
    return $matches;
  };
  return reduce( $grep_builder, $array, [])
}


# just a stub, hoping someone can help me with the threading
sub threadedreduce {
  my $function = shift;
  my $array = shift;
  my $init = shift;

  my $pm=new Parallel::ForkManager(10);
  my $result = $init;
  for my $el ( @$array) {
    $pm->start and next;
    $result = $function->( $result, $el);
    $pm->finish;
  }
  $pm->wait_all_children;
  return $result;
}

sub reduce {
  my $function = shift;
  my $array = shift;
  my $init = shift;

  my $result = $init;
  for my $el ( @$array) {
    $result = $function->( $result, $el)
  }
  return $result;
}
[download]

UPDATE: I'm actually thinking the promising avenue of approach may to use Parallel::Queue, along the lines diotalevi suggested yesterday in Re: using parallel processing to concatenate a string, where order of concatenation doesn't matter.

In reply to Re^2: Could there be a ThreadedMapReduce (instead of DistributedMapReduce)? by tphyahoo
in thread Could there be ThreadedMapReduce (and/or ForkedMapReduce) instead of DistributedMapReduce? by tphyahoo

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.