Re^4: Script exponentially slower as number of files to process increases

You are not checking for fork failures.

Worse: xnous does check for fork failures, but way too late:

    if (my $pid = fork) { # $pid defined and !=0 -->parent
        ++$forkcount;
    } else { # $pid==0 -->child
        open my $IN, '<', $infile or exit(0);
        open my $OUT, '>', "$tempdir/$subdir/text-$i" or exit(0);
        while (<$IN>) {
            tr/-!"#%&()*',.\/:;?@\[\\\]農怒筑><^)(|/ /; # no punct "
            s/^/ /;
            s/\n/ \n/;
            s/[[:digit:]]{1,12}//g;
            s/w(as|ere)/be/gi;
            s{$re2}{ $prefix{lc $1} }g;  # prefix
            s{$re3}{ $substring{lc $1} }g;  # part
            s{$re1}{ $whole{lc $1} }g;  # whole
            print $OUT "$_";
        }
        close $OUT;
        close $IN;
        defined $pid and exit(0); # $pid==0 -->child, must exit itself
    }
[download]

If fork() fails, $pid is undef, which is false. So perl will enter the else block, do everything that a child process does, but in the parent process. During that time, the entire child process management (i.e. $forkcount and wait/waitpid) does not happen. The check for failed fork() vs. real child (defined $pid) happens after the child code has run in the parent process. And it lacks any diagnostics.

When I use fork(), I usually write forking code like this:

my $pid=fork() // die "Can't fork: $!";
if ($pid) {
    # parent code
} else {
    # child code
}
[download]

Before Perl had the defined-or operator //, I used the following two lines instead of the first one.

my $pid=fork();
defined($pid) or die "Can't fork: $!";
[download]

Alexander

--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

Comment on Re^4: Script exponentially slower as number of files to process increases Select or Download Code