Re^4: Wait for individual sub processes

Out of order items from gathering are held temporarily until ordered items arrive.

so, if one of the early records takes an exceptionally long time to process, all the outputs from records processed after it will accumulate in memory until that record finally finishes, thus risk memory exhaustion?

If so, is there any mechanism, automated or manual, for detecting that memory accumulation and suspending chunk dispatch until the exceptionally slow record is processed and the output released?

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority". I'm with torvalds on this

In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked

Comment on Re^4: Wait for individual sub processes

Replies are listed 'Best First'.
Re^5: Wait for individual sub processes by marioroy (Prior) on Apr 25, 2015 at 14:12 UTC
The concern for memory utilization is valid. The file content is not gathered in the MCE example. Only the chunk id and file path are gathered. The parted content remains inside the output directory until ordered items arrive prior to being merged and unlinked. `... $mce->gather($chunk_id, "$part.out"); ...` [download] The upcoming MCE 1.7 release adds an await method to MCE::Queue. I will demonstrate the gathering of $chunk_id and "$part.out" to a queue and have workers block temporarily in a new MCE::Cookbook.pod. The idea is not to go beyond (200 + max_workers) number of files inside the output directory. `... $q->enqueue( [ $chunk_id, "$part.out" ] ); $q->await( 200 ); # blocks until the queue has 200 or less items ...` [download] Well, will ensure an example is included before releasing 1.7.	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^5: Wait for individual sub processes
by marioroy (Prior) on Apr 25, 2015 at 14:12 UTC

The concern for memory utilization is valid. The file content is not gathered in the MCE example. Only the chunk id and file path are gathered. The parted content remains inside the output directory until ordered items arrive prior to being merged and unlinked.

  ...
  $mce->gather($chunk_id, "$part.out");
  ...
[download]

The upcoming MCE 1.7 release adds an await method to MCE::Queue. I will demonstrate the gathering of $chunk_id and "$part.out" to a queue and have workers block temporarily in a new MCE::Cookbook.pod. The idea is not to go beyond (200 + max_workers) number of files inside the output directory.

   ...
   $q->enqueue( [ $chunk_id, "$part.out" ] );
   $q->await( 200 );  # blocks until the queue has 200 or less items
   ...
[download]

Well, will ensure an example is included before releasing 1.7.

[reply]
[d/l]
[select]