1nickt has asked for the wisdom of the Perl Monks concerning the following question:

Hi all, I am processing a list of hashrefs with MCE::Loop and running into a problem when the list of hashrefs to be processed only contains one element. In this case it appears that MCE splits the hashref into its k=v pairs (how does it do that? how does it know it's a hashref?) and hands each one to a worker. Here's the test code:

use strict; use warnings; use feature 'say'; use Data::Dumper; ++$Data::Dumper::Sortkeys; use MCE::Loop; use MCE::Candy; my $aref = []; MCE::Loop::init( max_workers => 4, chunk_size => 1, gather => MCE::Candy::out_iter_array( $aref ), ); for ( 0, 1 ) { say "Test $_"; mce_loop { my ( $mce, $chunk_ref, $chunk_id ) = @_; warn "chunk_ref " . Dumper $chunk_ref; my ($chunk) = @{ $chunk_ref }; warn "chunk " . Dumper $chunk; MCE->gather( $chunk_id, $chunk->{'foo'} ); } @{ get_data( $_ ) }; say "$_ : " . Dumper $aref; $aref = []; } sub get_data { my $which = shift; if ( $which == 1 ) { return [ { foo => 'bar', baz => 'qux' } ]; } else { return [ { foo => 'bar', baz => 'qux' }, { foo => 'qux', baz => 'bar' }, ]; } } __END__
and the output:
Test 0 chunk_ref $VAR1 = [ { 'baz' => 'qux', 'foo' => 'bar' } ]; chunk $VAR1 = { 'baz' => 'qux', 'foo' => 'bar' }; chunk_ref $VAR1 = [ { 'baz' => 'bar', 'foo' => 'qux' } ]; chunk $VAR1 = { 'baz' => 'bar', 'foo' => 'qux' }; 0 : $VAR1 = [ 'bar', 'qux' ]; Test 1 chunk_ref $VAR1 = { 'baz' => 'qux' }; Not an ARRAY reference at /tmp/mce.pl line 20, <__ANONIO__> line 3. chunk_ref $VAR1 = { 'foo' => 'bar' }; Not an ARRAY reference at /tmp/mce.pl line 20, <__ANONIO__> line 5. 1 : $VAR1 = [];

As you can see it works as expected even when there are fewer elements than workers ( as with two elements ). But it appears that with one element, MCE handles it differently. Thanks for any ideas!


The way forward always starts with a minimal test.

Replies are listed 'Best First'.
Re: MCE::Loop with only one element in list
by marioroy (Prior) on Jun 01, 2017 at 04:10 UTC

    Hello 1nickt,

    Support for hash ref was added recently. In your case, change the following line:

    @{ get_data( $_ ) }

    to this:

    get_data( $_ )

    Basically, give MCE the array ref directly so that it doesn't think to process the array containing a single hash ref as input_data => hash_ref behind the scene.

    Regards, Mario

    Update:

    In the OP's case, the array contains nested hash elements. MCE can take an array ref \@array to ensure the data is processed as an array.

    mce_loop { ... } \@array;

      Thanks Mario, that change gives the following output:

      Test 0 chunk_ref $VAR1 = [ { 'baz' => 'qux', 'foo' => 'bar' } ]; chunk $VAR1 = { 'baz' => 'qux', 'foo' => 'bar' }; chunk_ref $VAR1 = [ { 'baz' => 'bar', 'foo' => 'qux' } ]; chunk $VAR1 = { 'baz' => 'bar', 'foo' => 'qux' }; 0 : $VAR1 = [ 'bar', 'qux' ]; Test 1 chunk_ref $VAR1 = [ { 'baz' => 'qux', 'foo' => 'bar' } ]; chunk $VAR1 = { 'baz' => 'qux', 'foo' => 'bar' }; 1 : $VAR1 = [];
      ... in other words, it does not mangle the hashref when there's only one, but it does not gather the desired data to the shared aref.

      Thanks!


      The way forward always starts with a minimal test.

        Hello 1nickt,

        The gather option must be set each time when running MCE inside a loop. In that case, it's easier to use MCE::Flow instead which allows one to pass a hash containing MCE options for the job.

        use strict; use warnings; use feature 'say'; use Data::Dumper; ++$Data::Dumper::Sortkeys; use MCE::Flow; use MCE::Candy; my $aref = []; MCE::Flow::init( max_workers => 4, chunk_size => 1, ); for ( 0, 1 ) { say "Test $_"; mce_flow { gather => MCE::Candy::out_iter_array( $aref ) }, sub { my ( $mce, $chunk_ref, $chunk_id ) = @_; # warn "chunk_ref " . Dumper $chunk_ref; my ($chunk) = @{ $chunk_ref }; # warn "chunk " . Dumper $chunk; MCE->gather( $chunk_id, $chunk->{'foo'} ); }, get_data( $_ ); say "$_ : " . Dumper $aref; $aref = []; } sub get_data { my $which = shift; if ( $which == 1 ) { return [ { foo => 'bar', baz => 'qux' } ]; } else { return [ { foo => 'bar', baz => 'qux' }, { foo => 'qux', baz => 'bar' }, ]; } }

        Output

        Test 0 0 : $VAR1 = [ 'bar', 'qux' ]; Test 1 1 : $VAR1 = [ 'bar' ];

        Regards, Mario.

        Update:

        When gathering data and running inside a loop, it's important to specify the gather option each time. MCE::Candy::out_iter_array($aref) returns a closure block contaning a lexical order_id variable starting at 1. Basically, order_id correlates to chunk_id. Out of order items are held temporarily.

        sub out_iter_array { my $_aref = shift; my %_tmp; my $_order_id = 1; _croak("The argument to \"out_iter_array\" is not an array ref.") unless ( ref $_aref eq 'ARRAY' ); return sub { my $_chunk_id = shift; if ( $_chunk_id == $_order_id && keys %_tmp == 0 ) { # already orderly $_order_id++; push @{ $_aref }, @_; } else { # hold temporarily otherwise until orderly @{ $_tmp{ $_chunk_id } } = @_; while ( 1 ) { last unless exists $_tmp{ $_order_id }; push @{ $_aref }, @{ delete $_tmp{ $_order_id++ } }; } } }; }