Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re^7: Script exponentially slower as number of files to process increases

by marioroy (Parson)
on Jan 28, 2023 at 18:38 UTC ( #11149994=note: print w/replies, xml ) Need Help??


in reply to Re^6: Script exponentially slower as number of files to process increases
in thread Script exponentially slower as number of files to process increases

For the kikuchiyo.pl script, the fix is initializing subdircount to maxforks - 1. That ensures safety, as each worker sets $i using the same range (starting at 0).

my $subdircount = $maxforks - 1;

Update: Added results for threads and MCE::Child.

Running with 512 workers:

######################################################## # threads, mod: init data after spawning threads # https://www.perlmonks.org/?node_id=11149880 ######################################################## $ rm -fr temp data.dat $ time ../threads.pl ; ls -lh data.dat Parsing 61441 files maxthreads: 512 regex: 13.389084 real 0m13.443s user 3m25.771s sys 0m3.514s -rw-r--r-- 1 mario mario 471M Jan 28 13:25 data.dat ######################################################## # kikuchiyo.pl, mod: my $subdircount = $maxforks - 1; # https://www.perlmonks.org/?node_id=11149910 ######################################################## $ rm -fr temp data.dat $ time ../kikuchiyo.pl ; ls -lh data.dat Parsing 61441 files maxforks: 512 regex: 7.810577 real 0m9.393s user 2m11.069s sys 0m4.456s -rw-r--r-- 1 mario mario 471M Jan 28 12:46 data.dat ######################################################## # MCE::Child, mod: init data after spawning processes # https://www.perlmonks.org/?node_id=11149917 ######################################################## $ rm -fr temp data.dat $ time ../mcechild.pl ; ls -lh data.dat Parsing 61441 files maxforks: 512 regex: 7.858324 real 0m7.890s user 2m11.238s sys 0m4.026s -rw-r--r-- 1 mario mario 471M Jan 28 13:31 data.dat ######################################################## # mceiter1.pl, MCE chunking # https://www.perlmonks.org/?node_id=11149971 ######################################################## $ rm -fr temp data.dat $ time ../mceiter1.pl ; ls -lh data.dat Parsing 61441 files maxforks: 512 chunksize: 128 regex: 7.827401 real 0m7.885s user 2m12.198s sys 0m2.232s -rw-r--r-- 1 mario mario 471M Jan 28 12:50 data.dat ######################################################## # mceiter2.pl, MCE chunking + workers appending data.dat # https://www.perlmonks.org/?node_id=11149978 ######################################################## $ rm -fr temp data.dat $ time ../mceiter2.pl ; ls -lh data.dat Parsing 61441 files maxforks: 512 chunksize: 50 regex: 4.294076 real 0m4.328s user 2m2.587s sys 0m1.759s -rw-r--r-- 1 mario mario 471M Jan 28 12:51 data.dat

For cpu-bound task, there's no reason to run many-many workers; beyond physical limitation.

$ time ../threads.pl Parsing 61441 files maxthreads: 32 regex: 10.622881 real 0m10.649s user 3m35.313s sys 0m2.173s $ time ../kikuchiyo.pl Parsing 61441 files maxforks: 32 regex: 7.71529 real 0m8.276s user 2m7.270s sys 0m2.150s $ time ../mcechild.pl Parsing 61441 files maxforks: 32 regex: 7.859328 real 0m7.889s user 2m4.981s sys 0m2.286s $ time ../mceiter1.pl Parsing 61441 files maxforks: 32 chunksize: 128 regex: 7.545659 real 0m7.581s user 2m7.077s sys 0m1.775s $ time ../mceiter2.pl Parsing 61441 files maxforks: 32 chunksize: 50 regex: 4.066287 real 0m4.100s user 2m3.994s sys 0m0.880s

Replies are listed 'Best First'.
Re^8: Script exponentially slower as number of files to process increases
by xnous (Sexton) on Jan 30, 2023 at 22:01 UTC
    Ran some more tests, your latest mceiter2.pl script is consistently faster (albeit by a small margin) than fork.pl, however the run time includes the output file merging which is a measurable gain.

      So nice of you to write back. That script you linked to is quite efficient, considering it factors out entirely the manager process having to merge (now handled by the workers, cooperatively).

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11149994]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (2)
As of 2023-03-31 02:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Which type of climate do you prefer to live in?






    Results (74 votes). Check out past polls.

    Notices?