in reply to UPDATED: Forking On Foreach Keys In Hash, Passing Hash To Sub, And Speed/Efficiency Recommendations
The parent process in the subroutine is calling "waitpid" on its child process, and so it doesn't return until the child process is done. I don't do parallel stuff much - I hope the suggestion above about Parallel::ForkManager will be useful, but short of using that, I think the thing you might want to try is to have the subroutine return the pid of the child; push that onto an array or hash in the foreach loop, and then after that loop is done (while children are still running), call waitpid repeatedly until there are no more children pending. (Or something to that effect… again, I'm not an expert on this.)foreach $key (sort keys %npanxxhash) { &CountAndHash($key,\@npanxxarray,\%npanxxhash); }
Also, this is a minor point, but on 5 MB million lines of input (any characters per line), the difference could be noticeable - instead of this:
Try this -- note the difference in the regex and split (the syntax changes are just a style preferences):while (<IN>) { if ( $_ =~ m/^{.*$/ ) { #Grab 9999991234 from line above my ($a,$MIN,$c,$d,$e,$f) = split( /,/ ); $minhash{$MIN} = undef; } }
while (<IN>) { next if ( /^{/ ); ## we only need to check the first character. #Grab 9999991234 from line above my $MIN = ( split /,/ )[1]; ## we only need to assign one variabl +e $minhash{$MIN} = undef; }
|
|---|