comment on

Hi all,

Am writing about MCE being one option. 4 million is a big number. Therefore will describe various ways (non-chunking and chunking).

First, chunk_size => 1

   use MCE::Loop max_workers => 4, chunk_size => 1;

   ## non-chunking takes 2m18s to complete

   mce_loop {
      MCE->say($_);

   } 1..4_000_000;
[download]

Next, chunk_size => 'auto'

   use MCE::Loop max_workers => 4, chunk_size => 'auto';

   ## chunking takes 0m12s to complete (IPC becomes 11.5x faster)

   mce_loop {
      my ($mce, $chunk_ref, $chunk_id) = @_;

      my @o; for (@{ $chunk_ref }) {
         push @o, $_;
      }
      MCE->say(@o);

   } 1..4_000_000;
[download]

Finally, processing a file directly containing 4 million rows.

   use MCE::Loop max_workers => 4, chunk_size => 'auto';

   ## processing a file directly (mce_loop_f) takes 0m11.7s

   mce_loop_f {
      my ($mce, $chunk_ref, $chunk_id) = @_;
      chomp @{ $chunk_ref };

      my @o; for (@{ $chunk_ref }) {
         push @o, $_;
      }
      MCE->say(@o);

   } '/path/to/four_million_rows.txt';
[download]

But hold on... Much time comes from writing 4 million rows to STDOUT. Fasten your seatbelt. Rerunning and directing output to /dev/null. The time also includes MCE->say(...)

   ## array non-chunking...:  1m54.467s
   ## array auto-chunking..:  0m 0.843s   136x
   ## file  auto-chunking..:  0m 0.467s   245x
[download]

Running again by commenting out MCE->say(...) to take that out of the equation. Am pleasantly surprised to see 4 million rows with chunk_size 1 in just 1 minute. Gosh, that is fast considering chunk_size => 1 (over 61k per second). However, chunking reduces IPC altogether. Furthermore, MCE can process an input file directly for even lesser overhead.

   ## array non-chunking...:  1m 5.458s
   ## array auto-chunking..:  0m 0.821s    80x
   ## file  auto-chunking..:  0m 0.411s   159x
[download]

It's not fair... :) Part of that time includes the time to load Perl itself and any modules. There is also the time to spawn 4 workers and shutting down in the end. I tested by adding MCE->last. The time needed is 0m0.074s. Therefore, the 0.411s above is really 0.337.

   mce_loop_f {
      MCE->last;    # immediately leaves the block and input
      ... 
   } '/path/to/four_million_rows.txt';
[download]

Well then, here are the times by subtracting 0.074s from above to get the time needed for IPC only. Ha, still not able to break 1 minute for chunk_size => 1.

   ## array non-chunking...:  1m 5.384s
   ## array auto-chunking..:  0m 0.747s    88x
   ## file  auto-chunking..:  0m 0.337s   194x
[download]

Chunking enables IPC to run many times faster due to lesser overhead.

In reply to Re: Splitting large array for threads. by marioroy
in thread Splitting large array for threads. by Anonymous Monk

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.