My task at hand is to compress (zip) up a number of files in a large number of directories. Parallelism will help because we have lots of RAM and CPU, and speed is needed.

I have code that partitions the directories into a number of groups and uses Parallel::ForkManager to process "N" groups in parallel. This works nicely, however, the problem with this solution is that the children use the "zip" program to do the compression e.g.:

open(PIPE, '-|:unix', '/usr/bin/zip', '-T', $zipfile, @to_zip);
If the the main program gets a signal, the children, and the zip grandchildren also get the signal, and the zip program can abort:
^C 08-Dec-2014 12:41:12 - Caught QUIT signal ======= Backtrace: ========= zipup.pl: Zipfile XXXXXXXX.zip did not test OK /lib64/libc.so.6[0x397a476166] /lib64/libc.so.6[0x397a478c93] /lib64/libc.so.6(fclose+0x14d)[0x397a4667cd] /usr/bin/zip[0x402eee] /usr/bin/zip[0x403cce] /usr/bin/zip[0x409566] /lib64/libc.so.6(__libc_start_main+0xfd)[0x397a41ed1d] /usr/bin/zip[0x401fb9] ======= Memory map: ======== 00400000-00432000 r-xp 00000000 fd:00 135862 + /usr/bin/zip 00631000-00634000 rw-p 00031000 fd:00 135862 + /usr/bin/zip 00634000-00682000 rw-p 00000000 00:00 0 00833000-00834000 rw-p 00033000 fd:00 135862 + /usr/bin/zip 01a0f000-01a30000 rw-p 00000000 00:00 0 + [heap] 3979c00000-3979c20000 r-xp 00000000 fd:00 1966151 + /lib64/ld-2.12.so 3979e1f000-3979e20000 r--p 0001f000 fd:00 1966151 + /lib64/ld-2.12.so 3979e20000-3979e21000 rw-p 00020000 fd:00 1966151 + /lib64/ld-2.12.so 3979e21000-3979e22000 rw-p 00000000 00:00 0 397a400000-397a58b000 r-xp 00000000 fd:00 1966152 + /lib64/libc-2.12.so 397a58b000-397a78a000 ---p 0018b000 fd:00 1966152 + /lib64/libc-2.12.so 397a78a000-397a78e000 r--p 0018a000 fd:00 1966152 + /lib64/libc-2.12.so 397a78e000-397a78f000 rw-p 0018e000 fd:00 1966152 + /lib64/libc-2.12.so 397a78f000-397a794000 rw-p 00000000 00:00 0 397c000000-397c016000 r-xp 00000000 fd:00 1966220 + /lib64/libgcc_s-4.4.7-20120601.so.1 397c016000-397c215000 ---p 00016000 fd:00 1966220 + /lib64/libgcc_s-4.4.7-20120601.so.1 397c215000-397c216000 rw-p 00015000 fd:00 1966220 + /lib64/libgcc_s-4.4.7-20120601.so.1 7fb65ef87000-7fb664e18000 r--p 00000000 fd:00 132167 + /usr/lib/locale/locale-archive 7fb664e18000-7fb664e1b000 rw-p 00000000 00:00 0 7fb664e26000-7fb664e29000 rw-p 00000000 00:00 0 7fff56934000-7fff5694a000 rw-p 00000000 00:00 0 + [stack] 7fff569ff000-7fff56a00000 r-xp 00000000 00:00 0 + [vdso] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 + [vsyscall] zipup.pl: Zipfile XXXXXXY.zip did not test OK

So my question before the Wise is this: Should I look into another process manager that allows for forks/execs within children (Parallel::MPM::Prefork seems to), use a Perl module to zip, or something completely different. They reason I ask the wise is that there seem to be many solutions out there--more than I have time to try--and the wise would have experience having tried some of them. (See Niels Bohr's definition of "Expert"). BTW, one of my goals is to keep the number of modules dependencies to a minimum.


In reply to Task Partitioning and Parallelism Advice needed by co-jimbob

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.