in reply to Parallel::ForkManager and multiple datasets
In each child process create as complex a data structure as you wish totally contained within the child block. Then, when you are done processing, serialise the data structure using one of many existing serialisers, e.g. Sereal, to serialise the data.
my $complex_data_structure = {'a'=>[1,2,3], 'b'=>{'c'=>[4,5,6],'d'=>LW +P::UserAgent->new()}}; my $serialised_data = Sereal::Encoder::encode_sereal($complex_data_str +ucture); $pfm->finish(0, \$serialised_data); # <<< note that we pass a referenc +e to our serialised-data.
The callback run_on_finish() is called every time a child is done processing. There we will de-serialise our data via the $data_structure_reference, as thus:
my ($pid, $exit_code, $ident, $exit_signal, $core_dump, $data_structur +e_reference) = @_; my $data = Sereal::Decoder::decode_sereal($$data_structure_reference); + ## de-referencing the ref to the serialised data and then de-seriali +sing.
Below is something to get you started. Note a few points: 1) how to get the pid of the child, 2) pass the data back via its reference. But the main point is that you serialise your complex data from child as a, let's say huge zipped string and that is passed on the parent process. I am not sure how well Sereal can handle references to objects created within the child and how well can re-constitute them back in parent.
#!/usr/bin/env perl use strict; use warnings; use Parallel::ForkManager; use Data::Dump qw/dump/; # bliako use Sereal::Encoder qw(encode_sereal sereal_encode_with_object); use Sereal::Decoder qw(decode_sereal sereal_decode_with_object); my @names = (); my %list = (); my %thing = (); my $threads = 20; my $pfm = new Parallel::ForkManager( $threads ); my %results = (); $pfm->run_on_finish( sub { my ($pid, $exit_code, $ident, $exit_signal, $core_dump, $data_ +structure_reference) = @_; my $data = Sereal::Decoder::decode_sereal($$data_structure_ref +erence); # surely this is sequential code here so no need to lock %resu +lts, right? $results{$pid} = $data; # using pid as key is not a good idea because maybe a pid numb +er will be eventually recycled. }); my $things_hr = { 'job1' => 'this is job 1 data', 'job2' => 'this is job 2 data', 'job3' => 'this is job 3 data', 'job4' => 'this is job 4 data', 'job5' => 'this is job 5 data', }; THELOOP: foreach my $thing(keys %{$things_hr}) { print "THING = $thing\n"; $pfm->start and next THELOOP; my $pid = $$; my $returned_data = { 'item1' => "item1 from pid $pid, for item $thing and v +alue ".$things_hr->{$thing}, 'item2' => "item2 from pid $pid, for item $thing and v +alue ".$things_hr->{$thing}, "item3 are some array refs for pid: $pid", => [1,2,3,4 +], }; my $serialised_data = Sereal::Encoder::encode_sereal($returned +_data); print "pid=$pid, this is what I am sending:\n".dump($returned_ +data)."\n"; $pfm->finish(0, \$serialised_data); } $pfm->wait_all_children; print "Here are the results:\n".dump(%results)."\n";
bw, bliako
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Parallel::ForkManager and multiple datasets
by Speed_Freak (Sexton) on Jul 06, 2018 at 12:57 UTC | |
by bliako (Abbot) on Jul 07, 2018 at 08:46 UTC |