1nickt has asked for the wisdom of the Perl Monks concerning the following question:
Learned brethren:
I am using MCE::Shared to create a shared hash for collecting results. I am using MCE::Loop to fork off workers and process a long list of tasks. (Note: I have tried the below code with Parellel::ForkManager instead, with the same results.)
The program works as expected: workers are forked, report their results to the shared hash, and the hash is printed from the END block by the parent process.
However, I would like to be able to interrupt the program and have the hash printed (with manual interrupt, and also on uncaught exception). In a single-process environment this works as expected: the hash is built up and dumped from whatever state it has, in the END block, on interrupt. But in parallel-processing environment, I get unexpected results.
Here is my test program:
use strict; use warnings; use feature 'say';
use Data::Dumper; ++$Data::Dumper::Sortkeys;
use Time::HiRes qw/ usleep time /;
use MCE::Shared;
use MCE::Loop;
my $pid = $$;
say "PID $pid";
tie my %hash, 'MCE::Shared', ();
$SIG{'INT'} = sub { kill 'TERM', -$$ };
$SIG{'TERM'} = sub { exit 0 };
MCE::Loop->init( max_workers => 6, chunk_size => 10 );
mce_loop {
say "Forked child with $$";
my ( $mce, $chunk_ref, $chunk_id ) = @_;
for ( @{ $chunk_ref } ) {
$hash{ sprintf '%.2d %s', $_, $$ } = time;
sleep 1;
}
} ( 0 .. 99 );
MCE::Loop->finish;
END {
say sprintf '%s %s (%s) in END', $$, time, $$ == $pid ? 'Parent' :
+ 'Child';
if ( $$ == $pid ) {
say 'Parent is ready to dump';
say 'Dumping: ' . Dumper \%hash;
}
}
__END__
<P>
Here's an example of the output I am getting from my test program:
<c>
PID 13209
Forked child with 13211
Forked child with 13212
Forked child with 13215
Forked child with 13214
Forked child with 13213
Forked child with 13216
^C13211 1491567754.05168 (Child) in END
13216 1491567754.05169 (Child) in END
13213 1491567754.05382 (Child) in END
13212 1491567754.05404 (Child) in END
13214 1491567754.05448 (Child) in END
13209 1491567754.05601 (Parent) in END
Parent is ready to dump
13215 1491567754.05627 (Child) in END
^C
Three things strike me as odd about this:
- Why is a child process surviving longer than the parent with kill 'Term', -$$ ? (Note: this is not always the case. Sometimes the parent dies last; sometimes there are multiple children surviving longer than the parent.)
- Why is a second interrupt signal needed here? If not given, the program just hangs after printing 'Parent is ready to dump'. (This is always the case.)
- Why is the line containing the Dumper() statement not executed?
Comparing with a Parallel::ForkManager script that doesn't use a shared variable (still shows child processes surviving longer than the parent, but exits completely with one interrupt signal):
use strict; use warnings; use feature 'say';
use Data::Dumper; ++$Data::Dumper::Sortkeys;
use Parallel::ForkManager;
my $pid = $$;
say "PID $pid";
$SIG{'INT'} = sub { kill 'TERM', -$$ };
$SIG{'TERM'} = sub { exit 0 };
my $pm = Parallel::ForkManager->new(6);
for ( 0 .. 9 ) {
my $start = 10 * $_;
$pm->start and next;
for ( $start .. $start + 9 ) {
say sprintf '%.2d %s', $_, $$;
sleep 1;
}
$pm->finish;
}
END {
say sprintf '%s (%s) in END', $$, $$ == $pid ? 'Parent' : 'Child';
}
__END__
PID 14274
00 14275
10 14276
20 14277
30 14278
40 14279
50 14280
01 14275
11 14276
21 14277
31 14278
41 14279
51 14280
02 14275
12 14276
22 14277
32 14278
42 14279
52 14280
^C43 14279
33 14278
23 14277
13 14276
03 14275
53 14280
14279 (Child) in END
14280 (Child) in END
14277 (Child) in END
14275 (Child) in END
14276 (Child) in END
14274 (Parent) in END
14278 (Child) in END
I realize this may be more of a fork question than shared data, but the real problem is only manifesting when trying to use the shared data structure. Thanks for any pointers.
The way forward always starts with a minimal test.
Re: Interrupt multi-process program while using MCE::Shared hash: END block code does not (all) run
by Anonymous Monk on Apr 07, 2017 at 13:28 UTC
|
# $SIG{'INT'} = sub { kill 'TERM', -$$ }; # this works already
$SIG{'TERM'} = sub { MCE->exit(0) }; # Notifies the parent
This is another possibility.
$SIG{'TERM'} = sub {
if (MCE->wid > 0) {
# worker
MCE->exit(0);
} else {
# parent
MCE::Signal::stop_and_exit('TERM');
}
};
| [reply] [d/l] [select] |
|
Thank you for the reply ( marioroy? )!
In the first example, it was necessary to move the SIG handlers below the mce_loop statement, else it gave error:
MCE::exit: method is not allowed by the manager process at mce.pl line
+ 19.
After moving the SIG handlers, it gives no error but does not appear to execute any of the code in END:
PID 18566
Forked child with 18568
Forked child with 18569
Forked child with 18571
Forked child with 18572
Forked child with 18570
Forked child with 18573
^C
## mce.pl: caught signal (INT), exiting
Killed
The second example you provided gives the same result:
PID 18874
Forked child with 18877
Forked child with 18876
Forked child with 18879
Forked child with 18878
Forked child with 18880
Forked child with 18881
^C
## mce3.pl: caught signal (INT), exiting
Killed
Note that the program runs successfully if left to complete:
PID 19017
Forked child with 19019
Forked child with 19020
Forked child with 19023
Forked child with 19022
Forked child with 19024
Forked child with 19021
Forked child with 19019
Forked child with 19020
Forked child with 19022
Forked child with 19023
19020 1491572765.74874 (Child) in END
19019 1491572765.75111 (Child) in END
19024 1491572765.75325 (Child) in END
19022 1491572765.7533 (Child) in END
19021 1491572765.7552 (Child) in END
19023 1491572765.75711 (Child) in END
19017 1491572765.75939 (Parent) in END
Parent is ready to dump
Dumping: $VAR1 = {
'00 19019' => '1491572745.74377',
'01 19019' => '1491572746.74401',
'02 19019' => '1491572747.74421',
'03 19019' => '1491572748.74441',
'04 19019' => '1491572749.74457',
'05 19019' => '1491572750.74473',
'06 19019' => '1491572751.74493',
'07 19019' => '1491572752.7451',
'08 19019' => '1491572753.74529',
'09 19019' => '1491572754.7455',
'10 19020' => '1491572745.7436',
'11 19020' => '1491572746.74385',
'12 19020' => '1491572747.74405',
'13 19020' => '1491572748.74427',
...
}
Thank you again for the help.
The way forward always starts with a minimal test.
| [reply] [d/l] [select] |
|
$SIG{'TERM'} = sub {
if (MCE->wid > 0) {
# worker
MCE->exit(0);
} else {
# parent
MCE::Signal::stop_and_exit('TERM');
}
};
| [reply] [d/l] |
|
| [reply] |
Re: Interrupt multi-process program while using MCE::Shared hash: END block code does not (all) run
by Anonymous Monk on Apr 07, 2017 at 13:40 UTC
|
Oh yes, one might want to display the shared data. This will do.
$SIG{'TERM'} = sub {
if (MCE->wid > 0) {
# worker
MCE->exit(0);
} else {
# parent
say 'Parent is ready to dump';
say 'Dumping: ' . Dumper \%hash;
MCE::Signal::stop_and_exit('TERM');
}
};
| [reply] [d/l] |
Re: Interrupt multi-process program while using MCE::Shared hash: END block code does not (all) run
by Anonymous Monk on Apr 07, 2017 at 13:46 UTC
|
For the INT handler, one can do this. I understand now. You want to display the data already stored in the shared cache.
$SIG{'INT'} = sub {
if ( tied(%hash)->len ) {
(MCE->wid == 0)
? say 'Parent is ready to dump'
: say 'Worker is ready to dump';
say 'Dumping: ' . Dumper \%hash;
%hash = ();
}
MCE::Signal::stop_and_exit('INT');
};
$SIG{'TERM'} = sub {
if (MCE->wid > 0) {
# worker
MCE->exit(0);
} else {
# parent
say 'Parent is ready to dump';
say 'Dumping: ' . Dumper \%hash;
MCE::Signal::stop_and_exit('TERM');
}
};
| [reply] [d/l] |
|
use strict; use warnings; use feature 'say';
use Data::Dumper; ++$Data::Dumper::Sortkeys;
use MCE::Shared;
use MCE::Loop;
$|++;
my $pid = $$; say "PID $pid";
tie my %hash, 'MCE::Shared', ();
MCE::Loop->init( max_workers => 2, chunk_size => 1 );
mce_loop {
my ( $mce, $chunk_ref, $chunk_id ) = @_;
say sprintf 'Forked worker in slot %s with pid %s for chunk %s', M
+CE->wid, MCE->pid, $chunk_id;
for ( @{ $chunk_ref } ) {
$hash{ sprintf '%.2d %s', $_, $$ } = time;
say "After $_: " . Dumper \%hash;
sleep 1;
}
} ( 0 .. 4 );
MCE::Loop->finish;
$SIG{'INT'} = sub {
say 'Hello from INT';
if ( tied(%hash)->len ) {
(MCE->wid == 0) ? say 'Parent is ready to dump'
: say 'Worker is ready to dump';
say 'Dumping: ' . Dumper \%hash;
%hash = ();
}
MCE::Signal::stop_and_exit('INT');
};
$SIG{'TERM'} = sub {
say 'Hello from TERM';
if (MCE->wid > 0) { # worker
MCE->exit(0);
} else { # parent
say 'Parent is ready to dump';
say 'Dumping: ' . Dumper \%hash;
MCE::Signal::stop_and_exit('TERM');
}
};
END {
say sprintf '%s %s (%s) in END', $$, time, $$ == $pid ? 'Parent' :
+ 'Child';
if ( MCE->wid == 0 or $$ == $pid ) {
say "Parent is ready to dump";
say 'Dumping: ' . Dumper \%hash;
}
}
__END__
Output when interrupted:
PID 21106
Forked worker in slot 2 with pid 21110 for chunk 1
Forked worker in slot 1 with pid 21109 for chunk 2
After 0: $VAR1 = {
'00 21110' => '1491574316',
'01 21109' => '1491574316'
};
After 1: $VAR1 = {
'00 21110' => '1491574316',
'01 21109' => '1491574316'
};
^C
## mce3.pl: caught signal (INT), exiting
Killed
Output without interrupt:
PID 20939
Forked worker in slot 2 with pid 20942 for chunk 1
Forked worker in slot 1 with pid 20941 for chunk 2
After 0: $VAR1 = {
'00 20942' => '1491574178',
'01 20941' => '1491574178'
};
After 1: $VAR1 = {
'00 20942' => '1491574178',
'01 20941' => '1491574178'
};
Forked worker in slot 2 with pid 20942 for chunk 3
Forked worker in slot 1 with pid 20941 for chunk 4
After 2: $VAR1 = {
'00 20942' => '1491574178',
'01 20941' => '1491574178',
'02 20942' => '1491574179',
'03 20941' => '1491574179'
};
After 3: $VAR1 = {
'00 20942' => '1491574178',
'01 20941' => '1491574178',
'02 20942' => '1491574179',
'03 20941' => '1491574179'
};
Forked worker in slot 2 with pid 20942 for chunk 5
After 4: $VAR1 = {
'00 20942' => '1491574178',
'01 20941' => '1491574178',
'02 20942' => '1491574179',
'03 20941' => '1491574179',
'04 20942' => '1491574180'
};
20941 1491574181 (Child) in END
20942 1491574181 (Child) in END
20939 1491574181 (Parent) in END
Parent is ready to dump
Dumping: $VAR1 = {
'00 20942' => '1491574178',
'01 20941' => '1491574178',
'02 20942' => '1491574179',
'03 20941' => '1491574179',
'04 20942' => '1491574180'
};
Thanks again for your help.
The way forward always starts with a minimal test.
| [reply] [d/l] [select] |
|
Why, when chunk_size is set to `1`, is the same process reused?
Workers persist from start to finish. chunk_size refers to how many items a given worker receives per user_func.
user_begin
user_func
user_func
user_func
...
user_func
user_end
Why, when the interrupt is given, is nothing inside the SIG handlers being executed?
The overriding of SIG handlers must be done before calling mce_loop or before workers are spawned.
What is the purpose of emptying the hash in the INT signal handler?
The worker or parent receiving the signal displays the content and subsequently clears the hash before notifying others to exit. Thus, causing other workers to call the handler. We only need to display the content once.
| [reply] [d/l] |
|
|
|
Re: Interrupt multi-process program while using MCE::Shared hash: END block code does not (all) run
by Anonymous Monk on Apr 07, 2017 at 18:13 UTC
|
Okay, I see the issue. The shared-manager process is exiting from receiving the signal. Thus, not able to respond to later requests. Hence stalling the script.
Inside MCE::Shared::Server.pm around line 461, comment out the handler code and add the subsequent line. Basically, the shared-manager must still respond to requests inside application handlers. I'm not sure if I can handle both cases. This handler was placed here in the event something killed the shared-manager process.
sub _loop {
$_is_client = 0;
# $SIG{HUP} = $SIG{INT} = $SIG{QUIT} = $SIG{TERM} = sub {
# $SIG{INT} = $SIG{$_[0]} = sub { };
#
# CORE::kill($_[0], $_is_MSWin32 ? -$$ : -getpgrp);
# for my $_i (1..15) { sleep 0.060 }
#
# CORE::kill('KILL', $$);
# CORE::exit(255);
# };
$SIG{HUP} = $SIG{INT} = $SIG{QUIT} = $SIG{TERM} = sub { };
...
}
I've simplyfied the handling for the MCE script.
use strict; use warnings; use feature 'say';
use Data::Dumper; ++$Data::Dumper::Sortkeys;
use Time::HiRes 'sleep';
use MCE::Loop;
use MCE::Shared;
$|++;
my $pid = $$; say "PID $pid";
my $hash = MCE::Shared->hash();
$SIG{'INT'} = $SIG{'TERM'} = sub {
my $signal = shift; $SIG{'INT'} = $SIG{'TERM'} = sub {};
say "Hello from $signal: $$";
say 'Parent is ready to dump';
say 'Dumping: ' . Dumper $hash->export;
MCE::Signal::stop_and_exit('INT');
};
MCE::Loop->init(
max_workers => 2, chunk_size => 1, user_begin => sub {
$SIG{'INT'} = sub {
my $signal = shift;
say "Hello from $signal: $$";
MCE->exit(0);
};
$SIG{'TERM'} = sub {
my $signal = shift;
say "Hello from $signal: $$";
MCE::Signal::stop_and_exit($signal);
};
}
);
mce_loop {
my ( $mce, $chunk_ref, $chunk_id ) = @_;
say sprintf 'Forked worker in slot %s with pid %s for chunk %s', M
+CE->wid, MCE->pid, $chunk_id;
for ( @{ $chunk_ref } ) {
$hash->{ sprintf '%.2d %s', $_, $$ } = time;
say "After $_: " . Dumper $hash->export;
sleep 2;
}
} ( 0 .. 12 );
MCE::Loop->finish;
say "Parent is ready to dump";
say 'Dumping: ' . Dumper $hash->export;
| [reply] [d/l] [select] |
|
# $SIG{HUP} = $SIG{INT} = $SIG{QUIT} = $SIG{TERM} = sub {
# $SIG{INT} = $SIG{$_[0]} = sub { };
#
# CORE::kill($_[0], $_is_MSWin32 ? -$$ : -getpgrp);
# for my $_i (1..15) { sleep 0.060 }
#
# CORE::kill('KILL', $$);
# CORE::exit(255);
# };
There is no guarantee to which END block is called first by Perl. MCE::Shared has an END block to notify the shared-manager to exit. The script will stall had Perl called that one first. Therefore, leave intact the sig handling bits at the application level. The END block is not necessary. But simply added to see workers enter it.
use strict; use warnings; use feature 'say';
use Data::Dumper; ++$Data::Dumper::Sortkeys;
use Time::HiRes 'sleep';
use MCE::Loop;
use MCE::Shared;
$|++;
my $pid = $$; say "PID $pid";
my $hash = MCE::Shared->hash();
$SIG{'INT'} = $SIG{'TERM'} = sub {
my $signal = shift; $SIG{'INT'} = $SIG{'TERM'} = sub {};
say "Hello from $signal: $$";
say 'Parent is ready to dump';
say 'Dumping: ' . Dumper $hash->export;
MCE::Signal::stop_and_exit('INT');
};
MCE::Loop->init(
max_workers => 2, chunk_size => 1, user_begin => sub {
$SIG{'INT'} = sub {
my $signal = shift;
say "Hello from $signal: $$";
MCE->exit(0);
};
$SIG{'TERM'} = sub {
my $signal = shift;
say "Hello from $signal: $$";
MCE::Signal::stop_and_exit($signal);
};
}
);
mce_loop {
my ( $mce, $chunk_ref, $chunk_id ) = @_;
say sprintf 'Forked worker in slot %s with pid %s for chunk %s', M
+CE->wid, MCE->pid, $chunk_id;
for ( @{ $chunk_ref } ) {
$hash->{ sprintf '%.2d %s', $_, $$ } = time;
say "After $_: " . Dumper $hash->export;
sleep 3;
}
} ( 0 .. 12 );
MCE::Loop->finish;
END {
say "Hello from END block: $$";
if ($$ == $pid) {
say "Parent is ready to dump";
say 'Dumping: ' . Dumper $hash->export;
}
}
I will make a new MCE::Shared update after more testing. Thank you, 1nickt.
| [reply] [d/l] [select] |
|
Hello Mario,
(BTW you mentioned that having the signal handlers as well as END may be overdone ... the reason I have the data dumped in END is so it prints out upon uncaught exception.)
I made the change to MCE::Shared::Server as instructed. I took your last script (in the post I am replying to) and modified slightly. First, took some of the debug statements away and cleaned up others. Second, only dump the data in INT from parent process (as you have it in END). This gives following results:
With CTL-C, the END block is never reached by the parent; only the children. But the parent dumps the data in INT.
perl mce9.pl
Parent PID 11581
worker 2 (11584) processing chunk 1
worker 1 (11583) processing chunk 2
worker 1 (11583) processing chunk 4
worker 2 (11584) processing chunk 3
^CHello from INT: 11584
Hello from INT: 11581
Hello from END block: 11584
Parent in INT: $VAR1 = bless( {
'00 11584' => '1491592596',
'01 11583' => '1491592596',
'02 11584' => '1491592598',
'03 11583' => '1491592598'
}, 'MCE::Shared::Hash' );
## mce9.pl: caught signal (INT), exiting
Hello from INT: 11583
Hello from END block: 11583
Killed
When running to completion, the parent reaches the END block and dumps the data there:
perl mce9.pl
Parent PID 11554
worker 2 (11557) processing chunk 1
worker 1 (11556) processing chunk 2
worker 2 (11557) processing chunk 3
worker 1 (11556) processing chunk 4
worker 1 (11556) processing chunk 6
worker 2 (11557) processing chunk 5
worker 1 (11556) processing chunk 7
Hello from END block: 11557
Hello from END block: 11556
Hello from END block: 11554
Parent in END: $VAR1 = bless( {
'00 11557' => '1491592580',
'01 11556' => '1491592580',
'02 11557' => '1491592582',
'03 11556' => '1491592582',
'04 11557' => '1491592584',
'05 11556' => '1491592584',
'06 11556' => '1491592586'
}, 'MCE::Shared::Hash' );
Regarding END blocks, I believe Perl does guarantee the order (unlike BEGIN), which is Last In First Out. So the END block in the script should be executed before the one in a module that is loaded by use.
Code now:
use strict; use warnings; use feature 'say';
use Data::Dumper; ++$Data::Dumper::Sortkeys;
use MCE::Loop;
use MCE::Shared;
$|++;
my $pid = $$; say "Parent PID $pid";
my $hash = MCE::Shared->hash();
$SIG{'INT'} = $SIG{'TERM'} = sub {
my $signal = shift; $SIG{'INT'} = $SIG{'TERM'} = sub {};
say "Hello from $signal: $$";
if ( $$ == $pid ) {
say 'Parent in INT: ' . Dumper $hash->export;
}
MCE::Signal::stop_and_exit('INT');
};
MCE::Loop->init(
max_workers => 2, chunk_size => 1, user_begin => sub {
$SIG{'INT'} = sub {
my $signal = shift;
say "Hello from $signal: $$";
MCE->exit(0);
};
$SIG{'TERM'} = sub {
my $signal = shift;
say "Hello from $signal: $$";
MCE::Signal::stop_and_exit($signal);
};
}
);
mce_loop {
my ( $mce, $chunk_ref, $chunk_id ) = @_;
say sprintf 'worker %s (%s) processing chunk %s', MCE->wid, MCE->p
+id, $chunk_id;
for ( @{ $chunk_ref } ) {
$hash->{ sprintf '%.2d %s', $_, $$ } = time;
sleep 2;
}
} ( 0 .. 6 );
MCE::Loop->finish;
END {
say "Hello from END block: $$";
if ( $$ == $pid ) {
say 'Parent in END: ' . Dumper $hash->export;
}
}
Thank you again.
The way forward always starts with a minimal test.
| [reply] [d/l] [select] |
|
|
|
Re: Interrupt multi-process program while using MCE::Shared hash: END block code does not (all) run
by Anonymous Monk on Apr 07, 2017 at 20:48 UTC
|
Try the Perl 6 concurrency model, it is much more powerful. ;)
| [reply] |
|
|