in reply to UPDATED: Forking On Foreach Keys In Hash, Passing Hash To Sub, And Speed/Efficiency Recommendations

Apart from the points mentioned by others, the reason why the script isn't really doing anything in parallel is because on each iteration of this loop:
foreach $key (sort keys %npanxxhash) { &CountAndHash($key,\@npanxxarray,\%npanxxhash); }
The parent process in the subroutine is calling "waitpid" on its child process, and so it doesn't return until the child process is done. I don't do parallel stuff much - I hope the suggestion above about Parallel::ForkManager will be useful, but short of using that, I think the thing you might want to try is to have the subroutine return the pid of the child; push that onto an array or hash in the foreach loop, and then after that loop is done (while children are still running), call waitpid repeatedly until there are no more children pending. (Or something to that effect… again, I'm not an expert on this.)

Also, this is a minor point, but on 5 MB million lines of input (any characters per line), the difference could be noticeable - instead of this:

while (<IN>) { if ( $_ =~ m/^{.*$/ ) { #Grab 9999991234 from line above my ($a,$MIN,$c,$d,$e,$f) = split( /,/ ); $minhash{$MIN} = undef; } }
Try this -- note the difference in the regex and split (the syntax changes are just a style preferences):
while (<IN>) { next if ( /^{/ ); ## we only need to check the first character. #Grab 9999991234 from line above my $MIN = ( split /,/ )[1]; ## we only need to assign one variabl +e $minhash{$MIN} = undef; }
  • Comment on Re: Forking On Foreach Keys In Hash, Passing Hash To Sub, And Speed/Efficiency Recommendations
  • Select or Download Code