Hi all,
have question regarding the use of MCE package,
I have a workflow in mind:
where I build a hash table, by reading in a couple of key-value pair file, ultimately linking fileA's keys to the fileC's values via fileB.
next using this hash table i want to
apply a subroutine on multiple input files and output the results into separate files.
I want to try to use some parallel work here.
I use R a lot for doing parallel work mainly using the mclapply function in the parallel library,
I was hunting around for the parallel packages in Perl and found prefork, mce and fork manager.
i tried implementing the parallel portion with MCE as shown in the following:
*mostly took the code from:
https://metacpan.org/pod/MCE::Examples
#!/usr/bin/env perl
use v5.18;
use strict;
use warnings;
use autodie;
use MCE;
my @input_data = (0 .. 100 - 1);
## Make an output iterator for gather. Output order is preserved.
sub output_iterator {
my %tmp; my $order_id = 1;
return sub {
my ($result, $chunk_id) = @_;
$tmp{$chunk_id} = $result;
while (1) {
last unless (exists $tmp{$order_id});
open my $output, '>', "/path/to/my/files/$chunk_id.txt";
foreach (1..10) { print $output "\t",fibonacci($_)};
say $output;
close $output;
delete $tmp{$order_id++};
}
};
}
## Use $chunk_ref->[0] or $_ to retrieve the element.
my $mce = MCE->new(
chunk_size => 1, #setting to 1 = do not chunk
max_workers => 8, #number of CPU cores
gather => output_iterator(), #the function which will be applied to
+ each element of the array
);
MCE->foreach( \@input_data, sub {
my ($mce, $chunk_ref, $chunk_id) = @_;
my $result = sqrt($chunk_ref->[0]);
MCE->gather($result, $chunk_id);
});
sub fibonacci {
my $n = shift;
return undef if $n < 0;
my $f;
if ($n == 0) {
$f = 0;
} elsif ($n == 1) {
$f = 1;
} else {
$f = fibonacci($n-1) + fibonacci($n-2);
}
return $f;
}
however, i noticed that the number of output files are not consistent with the size of my input ie. my array input_data.
i would get 96 output files, thereafter 97, and finally 100, if i rerun the problem without deleting the output files.
what's wrong?
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.