Hi all,

Am writing about MCE being one option. 4 million is a big number. Therefore will describe various ways (non-chunking and chunking).

First, chunk_size => 1

use MCE::Loop max_workers => 4, chunk_size => 1; ## non-chunking takes 2m18s to complete mce_loop { MCE->say($_); } 1..4_000_000;

Next, chunk_size => 'auto'

use MCE::Loop max_workers => 4, chunk_size => 'auto'; ## chunking takes 0m12s to complete (IPC becomes 11.5x faster) mce_loop { my ($mce, $chunk_ref, $chunk_id) = @_; my @o; for (@{ $chunk_ref }) { push @o, $_; } MCE->say(@o); } 1..4_000_000;

Finally, processing a file directly containing 4 million rows.

use MCE::Loop max_workers => 4, chunk_size => 'auto'; ## processing a file directly (mce_loop_f) takes 0m11.7s mce_loop_f { my ($mce, $chunk_ref, $chunk_id) = @_; chomp @{ $chunk_ref }; my @o; for (@{ $chunk_ref }) { push @o, $_; } MCE->say(@o); } '/path/to/four_million_rows.txt';

But hold on... Much time comes from writing 4 million rows to STDOUT. Fasten your seatbelt. Rerunning and directing output to /dev/null. The time also includes MCE->say(...)

## array non-chunking...: 1m54.467s ## array auto-chunking..: 0m 0.843s 136x ## file auto-chunking..: 0m 0.467s 245x

Running again by commenting out MCE->say(...) to take that out of the equation. Am pleasantly surprised to see 4 million rows with chunk_size 1 in just 1 minute. Gosh, that is fast considering chunk_size => 1 (over 61k per second). However, chunking reduces IPC altogether. Furthermore, MCE can process an input file directly for even lesser overhead.

## array non-chunking...: 1m 5.458s ## array auto-chunking..: 0m 0.821s 80x ## file auto-chunking..: 0m 0.411s 159x

It's not fair... :) Part of that time includes the time to load Perl itself and any modules. There is also the time to spawn 4 workers and shutting down in the end. I tested by adding MCE->last. The time needed is 0m0.074s. Therefore, the 0.411s above is really 0.337.

mce_loop_f { MCE->last; # immediately leaves the block and input ... } '/path/to/four_million_rows.txt';

Well then, here are the times by subtracting 0.074s from above to get the time needed for IPC only. Ha, still not able to break 1 minute for chunk_size => 1.

## array non-chunking...: 1m 5.384s ## array auto-chunking..: 0m 0.747s 88x ## file auto-chunking..: 0m 0.337s 194x

Chunking enables IPC to run many times faster due to lesser overhead.


In reply to Re: Splitting large array for threads. by marioroy
in thread Splitting large array for threads. by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.