in reply to Useful number of childs revisited [SOLVED]

Your results will depend – will depend entirely – on “exactly what-it-is that the threads or processes are doing.”

In this case, you seem to be calculating factorials.   This is a so-called CPU-Bound activity, in which every thread will always consume its full time-slice, until it is pre-empted by another thread which will dutifully consume its full time-slice, and so on.   Two threads will run twice as fast ass one; four threads, twice again as fast as two; but there, the improvements will stop, and slightly degrade to account for the overhead spent round-robin switching between threads.   (Probably too small to see.)   The capacity of the only ruling-constraint – the CPU – has been reached, and fully utilized, and of course cannot be exceeded.

Most real-world activities are I/O-Bound, either directly, due to actual input/output that they do, or indirectly, due to virtual-memory page faults which they induce by trying to use (way ...)too-much memory.   These activities are dependent in their execution speed on the capacity of the system to perform I/O.   The threads/processes spend nearly all of their time waiting for an I/O activity:   either voluntarily, for an operation that they requested, or involuntarily due to a page-fault.   CPU utilization is relatively trivial.

Trouble is, when an I/O-bound activity begins to get stoppered-up, the degredation of throughput is “at first, linear, then exponential.”   A plot of the performance curve has a nearly right-angle “elbow” to it ... a point called thrashing, or “hitting the wall” (with a grisly and final “thud”).   (Example:   “6-at-a-time = 4 minutes; 12-at-a-time = 9 hours.”   A bit extreme, yes, but long ago I saw it happen.)

To avoid this, the best approach is to do what’s done in any fast-food restaurant:   maintain a manageable number of workers, each of which processes work from a thread-safe queue, so that, no matter how much work there is to do, the work in-process can be limited and adjusted.   (The waiting-line just gets longer, but the transactions/second remains stable.)   There are plenty of workload-management packages in CPAN to do this.