I'm using amazon's EC2 m1.xlarge (4 vCPU).
Because of waiting_all_children() and because I put some prints to know when each thread begins and ends. And also the time it took from one block of 50 to other block of 50 and it was the same time as the longer one.