Re^6: Interrupt multi-process program while using MCE::Shared hash: END block code does not (all) run

Hi Mario,

Yes, this code with the two changes to MCE::Shared::Server is working exactly as expected when run to completion, and when interrupted with CTL-C. Nice!

For completion I wanted to test what would happen with an uncaught exception, so I added a fatal operation to the loop:

for ( @{ $chunk_ref } ) {
        say $_ / 0 if $_ == 4;
[download]

... which resulted in the worker dying with the expected exception and message from Perl, while the manager and the other workers continued to completion, including updating the shared hash:

perl mce10.pl

Parent PID 29953
worker 2 (29957) processing chunk 1
worker 1 (29956) processing chunk 2
worker 2 (29957) processing chunk 3
worker 1 (29956) processing chunk 4
worker 1 (29956) processing chunk 6
worker 2 (29957) processing chunk 5
Illegal division by zero at mce10.pl line 27, <__ANONIO__> line 6.
Hello from END block: 29957
worker 1 (29956) processing chunk 7
Hello from END block: 29956
Hello from END block: 29953
Parent in END: $VAR1 = bless( {
                 '00 29957' => '1491661589',
                 '01 29956' => '1491661589',
                 '02 29957' => '1491661591',
                 '03 29956' => '1491661591',
                 '05 29956' => '1491661593',
                 '06 29956' => '1491661595'
               }, 'MCE::Shared::Hash' );
[download]

... note the missing key for #4.

I think that this is a good optional behaviour. But I think it would be nice to have the default case be: that an uncaught exception kills the whole program. One could choose to have the manager ignore an exception in a worker process, via a switch of some kind (maybe an option to MCE::Signal ?). But in that case I think it would be important to document the behaviour as demonstrated above, so users can know that the shared data cache will not necessarily contain all the expected data.

So that by default one can count on: either the shared data structure being populated as expected, or an exception ... a partially-populated data structure should only be provided on demand and with a warning.

Thank you again.

The way forward always starts with a minimal test.

Comment on Re^6: Interrupt multi-process program while using MCE::Shared hash: END block code does not (all) run Select or Download Code

Replies are listed 'Best First'.
Re^7: Interrupt multi-process program while using MCE::Shared hash: END block code does not (all) run by marioroy (Prior) on Apr 09, 2017 at 06:45 UTC
Hi 1nickt, To get the default behavior, one can specify the on_post_exit option. The status code for __DIE__ is 255 typically. `MCE::Loop->init( max_workers => 2, chunk_size => 1, user_begin => sub { $SIG{'INT'} = sub { my $signal = shift; say "Hello from $signal: $$"; MCE->exit(0); }; }, on_post_exit => sub { my ($mce, $e) = @_; if ($e->{status} == 255) { MCE::Signal::stop_and_exit('__DIE__'); } } );` [download] More info on on_post_exit is found here. The die handler for MCE workers is found inside MCE::Core::Worker ( ~ line 649 ). I cannot change the MCE->exit(...) line to MCE::Signal::stop_and_exit('__DIE__'). That will break scripts where MCE is called from inside an eval block. `local $SIG{__DIE__} = sub { ... local $SIG{__DIE__}; local $\ = undef; my $_die_msg = (defined $_[0]) ? $_[0] : ''; print {STDERR} $_die_msg; $self->exit(255, $_die_msg, $self->{_chunk_id}); };` [download] TODO: When on_post_exit is not* specified, have MCE workers abort input due to uncaught exception. Revisit eval. I was unable to get $@ to stick at the manager level. To make this work, I need to call die with the error obtained from the worker at the manager level. `eval { mce_loop { ... } @input }; # TODO: Today, $@ is not set at the manager level. # Thus, the eval block succeeds. Will fix this. if ( $@ ) { ... }` [download] Fortunately, one has control with the on_post_exit handler on what to do: e.g. restart_worker, stop_and_exit.	[reply] [d/l] [select]
Re^8: Interrupt multi-process program while using MCE::Shared hash: END block code does not (all) run by 1nickt (Canon) on Apr 09, 2017 at 19:56 UTC
Hi ++marioroy, That's very cool. Now when one of the workers encounters a fatal exception the whole program ends. But, of course, one has no guarantee of which tasks might have been completed (especially in a real-world scenario where task execution time varies and there are more than just two workers). So even with the program exiting via `__DIE__`, one can easily wind up with not only a partially populated hash/cache (as expected on early exit), but a hash partially populated out of sequence compared to the array being processed by `mce_loop`. E.g.: Parent PID 21987 worker 2 (21990) processing chunk 1 worker 1 (21989) processing chunk 2 worker 1 (21989) processing chunk 4 worker 2 (21990) processing chunk 3 worker 1 (21989) processing chunk 6 worker 2 (21990) processing chunk 5 Illegal division by zero at mce12.pl line 35, <__ANONIO__> line 6. Hello from END block: 21990 ## mce12.pl: caught signal (__DIE__), exiting Hello from INT: 21989 Hello from END block: 21989 Hello from END block: 21987 Parent in END: $VAR1 = { '00 21990' => '1491767068', '01 21989' => '1491767068', '02 21990' => '1491767070', '03 21989' => '1491767070', '05 21989' => '1491767072' }; [download] That was of course completely foreseeable, but I hadn't thought about it when asking for default DIE behaviour. Now, since it makes no difference in terms of the issue I first thought of (patchy-incomplete results), I think I may be more likely to favour the previous behaviour; in other words, continue processing even when one worker dies unexpectedly. Haha, sorry! Well, I think both choices are valuable and most needed, actually. Thanks again. The way forward always starts with a minimal test.	[reply] [d/l] [select]