co-jimbob has asked for the wisdom of the Perl Monks concerning the following question:
My task at hand is to compress (zip) up a number of files in a large number of directories. Parallelism will help because we have lots of RAM and CPU, and speed is needed.
I have code that partitions the directories into a number of groups and uses Parallel::ForkManager to process "N" groups in parallel. This works nicely, however, the problem with this solution is that the children use the "zip" program to do the compression e.g.:
If the the main program gets a signal, the children, and the zip grandchildren also get the signal, and the zip program can abort:open(PIPE, '-|:unix', '/usr/bin/zip', '-T', $zipfile, @to_zip);
^C 08-Dec-2014 12:41:12 - Caught QUIT signal ======= Backtrace: ========= zipup.pl: Zipfile XXXXXXXX.zip did not test OK /lib64/libc.so.6[0x397a476166] /lib64/libc.so.6[0x397a478c93] /lib64/libc.so.6(fclose+0x14d)[0x397a4667cd] /usr/bin/zip[0x402eee] /usr/bin/zip[0x403cce] /usr/bin/zip[0x409566] /lib64/libc.so.6(__libc_start_main+0xfd)[0x397a41ed1d] /usr/bin/zip[0x401fb9] ======= Memory map: ======== 00400000-00432000 r-xp 00000000 fd:00 135862 + /usr/bin/zip 00631000-00634000 rw-p 00031000 fd:00 135862 + /usr/bin/zip 00634000-00682000 rw-p 00000000 00:00 0 00833000-00834000 rw-p 00033000 fd:00 135862 + /usr/bin/zip 01a0f000-01a30000 rw-p 00000000 00:00 0 + [heap] 3979c00000-3979c20000 r-xp 00000000 fd:00 1966151 + /lib64/ld-2.12.so 3979e1f000-3979e20000 r--p 0001f000 fd:00 1966151 + /lib64/ld-2.12.so 3979e20000-3979e21000 rw-p 00020000 fd:00 1966151 + /lib64/ld-2.12.so 3979e21000-3979e22000 rw-p 00000000 00:00 0 397a400000-397a58b000 r-xp 00000000 fd:00 1966152 + /lib64/libc-2.12.so 397a58b000-397a78a000 ---p 0018b000 fd:00 1966152 + /lib64/libc-2.12.so 397a78a000-397a78e000 r--p 0018a000 fd:00 1966152 + /lib64/libc-2.12.so 397a78e000-397a78f000 rw-p 0018e000 fd:00 1966152 + /lib64/libc-2.12.so 397a78f000-397a794000 rw-p 00000000 00:00 0 397c000000-397c016000 r-xp 00000000 fd:00 1966220 + /lib64/libgcc_s-4.4.7-20120601.so.1 397c016000-397c215000 ---p 00016000 fd:00 1966220 + /lib64/libgcc_s-4.4.7-20120601.so.1 397c215000-397c216000 rw-p 00015000 fd:00 1966220 + /lib64/libgcc_s-4.4.7-20120601.so.1 7fb65ef87000-7fb664e18000 r--p 00000000 fd:00 132167 + /usr/lib/locale/locale-archive 7fb664e18000-7fb664e1b000 rw-p 00000000 00:00 0 7fb664e26000-7fb664e29000 rw-p 00000000 00:00 0 7fff56934000-7fff5694a000 rw-p 00000000 00:00 0 + [stack] 7fff569ff000-7fff56a00000 r-xp 00000000 00:00 0 + [vdso] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 + [vsyscall] zipup.pl: Zipfile XXXXXXY.zip did not test OK
So my question before the Wise is this: Should I look into another process manager that allows for forks/execs within children (Parallel::MPM::Prefork seems to), use a Perl module to zip, or something completely different. They reason I ask the wise is that there seem to be many solutions out there--more than I have time to try--and the wise would have experience having tried some of them. (See Niels Bohr's definition of "Expert"). BTW, one of my goals is to keep the number of modules dependencies to a minimum.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Task Partitioning and Parallelism Advice needed
by Anonymous Monk on Dec 09, 2014 at 03:37 UTC | |
by co-jimbob (Initiate) on Dec 09, 2014 at 17:01 UTC | |
|
Re: Task Partitioning and Parallelism Advice needed
by Anonymous Monk on Dec 09, 2014 at 17:22 UTC |