You do "For each of the N processes, wait until it is done" N times, which means you wait for N*N processes when there are only N. Your pseudo code is flawed. As a result, I'm not sure what you are trying to do, but I think those loops shouldn't be nested.