http://qs1969.pair.com?node_id=11149985


in reply to Re^4: Script exponentially slower as number of files to process increases
in thread Script exponentially slower as number of files to process increases

Certainly, there's an anomaly. Do you know what is causing the beautiful script to suddenly leap from 110 seconds down to 50 seconds? Unfortunately, half of the workers exit due to open file error. They go unnoticed due to no warning messages.

Applying the changes here may enlighten you as to why. Another verification is running ls -l data.dat. Is the file size smaller than expected?

  • Comment on Re^5: Script exponentially slower as number of files to process increases

Replies are listed 'Best First'.
Re^6: Script exponentially slower as number of files to process increases
by cavac (Parson) on Feb 02, 2023 at 13:08 UTC

    I would also suggest checking the operating system logs. Depending on OPs setup, the system (or some security software) may throttle or slow down the ability to fork new processes if it thinks there is something strange going on. (Similar to how init may prevent daemons to restart too often in a given timeframe).

    PerlMonks XP is useless? Not anymore: XPD - Do more with your PerlMonks XP
Re^6: Script exponentially slower as number of files to process increases
by kikuchiyo (Hermit) on Jan 28, 2023 at 17:44 UTC
    You could run the script with strace -o /tmp/trace -ff -e trace=%file script.pl to see why the file opens fail.

      For the kikuchiyo.pl script, the fix is initializing subdircount to maxforks - 1. That ensures safety, as each worker sets $i using the same range (starting at 0).

      my $subdircount = $maxforks - 1;

      Update: Added results for threads and MCE::Child.

      Running with 512 workers:

      For cpu-bound task, there's no reason to run many-many workers; beyond physical limitation.

        Ran some more tests, your latest mceiter2.pl script is consistently faster (albeit by a small margin) than fork.pl, however the run time includes the output file merging which is a measurable gain.
      Because $subdir doesn't reset to 0 after the 256th fork as in the OP script, which is incidentally the $subdircount value.
Re^6: Script exponentially slower as number of files to process increases
by xnous (Sexton) on Jan 28, 2023 at 17:51 UTC