in reply to Script exponentially slower as number of files to process increases

$forkcount -= waitpid(-1, WNOHANG) > 0 while $forkcount >= $maxforks;

This line hard loops and is probably what is chewing your "system" time. Replace with

wait, $forkcount-- while $forkcount >= $maxforks;

Replies are listed 'Best First'.
Re^2: Script exponentially slower as number of files to process increases
by xnous (Sexton) on Jan 26, 2023 at 09:48 UTC
    No, it doesn't. It actually makes no difference whether that wait() is there or not. I'm getting the same results with my version, yours or no $forkcount at all. It's very weird.

      Is there a fork limit on your system? You are not checking for fork failures...

        You are not checking for fork failures.

        Worse: xnous does check for fork failures, but way too late:

        if (my $pid = fork) { # $pid defined and !=0 -->parent ++$forkcount; } else { # $pid==0 -->child open my $IN, '<', $infile or exit(0); open my $OUT, '>', "$tempdir/$subdir/text-$i" or exit(0); while (<$IN>) { tr/-!"#%&()*',.\/:;?@\[\\\]”_“{’}><^)(|/ /; # no punct " s/^/ /; s/\n/ \n/; s/[[:digit:]]{1,12}//g; s/w(as|ere)/be/gi; s{$re2}{ $prefix{lc $1} }g; # prefix s{$re3}{ $substring{lc $1} }g; # part s{$re1}{ $whole{lc $1} }g; # whole print $OUT "$_"; } close $OUT; close $IN; defined $pid and exit(0); # $pid==0 -->child, must exit itself }

        If fork() fails, $pid is undef, which is false. So perl will enter the else block, do everything that a child process does, but in the parent process. During that time, the entire child process management (i.e. $forkcount and wait/waitpid) does not happen. The check for failed fork() vs. real child (defined $pid) happens after the child code has run in the parent process. And it lacks any diagnostics.

        When I use fork(), I usually write forking code like this:

        my $pid=fork() // die "Can't fork: $!"; if ($pid) { # parent code } else { # child code }

        Before Perl had the defined-or operator //, I used the following two lines instead of the first one.

        my $pid=fork(); defined($pid) or die "Can't fork: $!";

        Alexander

        --
        Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

      What's your ulimit -u for max user processes?

        ulimit -u returns 128011