http://qs1969.pair.com?node_id=11149994


in reply to Re^6: Script exponentially slower as number of files to process increases
in thread Script exponentially slower as number of files to process increases

For the kikuchiyo.pl script, the fix is initializing subdircount to maxforks - 1. That ensures safety, as each worker sets $i using the same range (starting at 0).

my $subdircount = $maxforks - 1;

Update: Added results for threads and MCE::Child.

Running with 512 workers:

######################################################## # threads, mod: init data after spawning threads # https://www.perlmonks.org/?node_id=11149880 ######################################################## $ rm -fr temp data.dat $ time ../threads.pl ; ls -lh data.dat Parsing 61441 files maxthreads: 512 regex: 13.389084 real 0m13.443s user 3m25.771s sys 0m3.514s -rw-r--r-- 1 mario mario 471M Jan 28 13:25 data.dat ######################################################## # kikuchiyo.pl, mod: my $subdircount = $maxforks - 1; # https://www.perlmonks.org/?node_id=11149910 ######################################################## $ rm -fr temp data.dat $ time ../kikuchiyo.pl ; ls -lh data.dat Parsing 61441 files maxforks: 512 regex: 7.810577 real 0m9.393s user 2m11.069s sys 0m4.456s -rw-r--r-- 1 mario mario 471M Jan 28 12:46 data.dat ######################################################## # MCE::Child, mod: init data after spawning processes # https://www.perlmonks.org/?node_id=11149917 ######################################################## $ rm -fr temp data.dat $ time ../mcechild.pl ; ls -lh data.dat Parsing 61441 files maxforks: 512 regex: 7.858324 real 0m7.890s user 2m11.238s sys 0m4.026s -rw-r--r-- 1 mario mario 471M Jan 28 13:31 data.dat ######################################################## # mceiter1.pl, MCE chunking # https://www.perlmonks.org/?node_id=11149971 ######################################################## $ rm -fr temp data.dat $ time ../mceiter1.pl ; ls -lh data.dat Parsing 61441 files maxforks: 512 chunksize: 128 regex: 7.827401 real 0m7.885s user 2m12.198s sys 0m2.232s -rw-r--r-- 1 mario mario 471M Jan 28 12:50 data.dat ######################################################## # mceiter2.pl, MCE chunking + workers appending data.dat # https://www.perlmonks.org/?node_id=11149978 ######################################################## $ rm -fr temp data.dat $ time ../mceiter2.pl ; ls -lh data.dat Parsing 61441 files maxforks: 512 chunksize: 50 regex: 4.294076 real 0m4.328s user 2m2.587s sys 0m1.759s -rw-r--r-- 1 mario mario 471M Jan 28 12:51 data.dat

For cpu-bound task, there's no reason to run many-many workers; beyond physical limitation.

$ time ../threads.pl Parsing 61441 files maxthreads: 32 regex: 10.622881 real 0m10.649s user 3m35.313s sys 0m2.173s $ time ../kikuchiyo.pl Parsing 61441 files maxforks: 32 regex: 7.71529 real 0m8.276s user 2m7.270s sys 0m2.150s $ time ../mcechild.pl Parsing 61441 files maxforks: 32 regex: 7.859328 real 0m7.889s user 2m4.981s sys 0m2.286s $ time ../mceiter1.pl Parsing 61441 files maxforks: 32 chunksize: 128 regex: 7.545659 real 0m7.581s user 2m7.077s sys 0m1.775s $ time ../mceiter2.pl Parsing 61441 files maxforks: 32 chunksize: 50 regex: 4.066287 real 0m4.100s user 2m3.994s sys 0m0.880s