Although there might be some explanations for it, I am a bit surprised that at least a few processes in parallel does not bring you some performance improvement. I am doing very often some intensive data extraction from a very large (split) database, this is mostly IOs, and I am usually getting the best results with a maximum number of processes anywhere between once to twice the number of CPUs or CPU cores. The results might be very different with a different setting. On the other hand, I came across some cases (wrongly written programs) where one process was locking the data access for the others, preventing any improvement from parallel processing (well, actually leading to poorer performance). I am wondering if you are not meeting one of these cases.