in reply to Parallel Modules ?
All right ... where do these subroutines fetch their data from? Have you thoroughly established that every component of that entire system possesses the capability to deliver 10 times as much data per second as they presently do?
Have you taken a murderously-close look at the algorithm, to verify beyond all reproach that, each time they execute, they are fetching precisely what they must fetch and nothing more than is needed to solve the problem, and that they are never fetching anything that they have already requested before?
I am cordially saying that because, way too often, I have watched efforts to “parallelize” something produce a final product that is noticeably(!) slower than its predecessor. Very fine parallelization engines do of course exist and several of these have already been cited. But I suggest that you cross-examine your algorithm first.