in reply to Question about Parallel::ForkManager
If you spawn one process for each URL, you're bound to cause massive thrashing and this is where your disk-usage is coming from. Instead, launch a small number of child processes who consume URLs from a thread-safe queue. The number of children should have no relation to the size of the workload that they must cooperatively accomplish. Rather, it should be tied to how many parallel processes you have determined the system can actually handle with maximum sustained throughput. (Do not be surprised if the best answer is "1.")