in reply to Downloading URL's in Parallel with Perl

I just wanted to make a comment real quick. Make sure that you are not falling to fallatious logic in believing that somehow calling forth the gods of "explicit parallelism" you will be guaranteed to incurr a speedup. Remember, in situations like this, where you are *pulling* on the dataflow, and more importantly, your data set rests on a node to which you do not have a guaranteed transfer rate, it is possible that parallelizing your GET's will not increase your actual throughput or minimize your wall-clock time for the entire transaction.

Remember your Von Neumann bottleneck, it is doubtful that what is slowing down your task is the overhead of processing the data, it is much more likely that the bottleneck exists in the actual data pipe (in other words, not processing BUT bandwidth!) And attempting to stuff 10k/sec of data down a 1k/sec pipe won't make the pipe bigger... it may actually end up slowing down your overall wall-clock time due to TCP collisions and other assorted baddies. I'm a big fan of parallelization throughout... just make sure it makes *sense* in your particular configuration.

  • Comment on Re: Downloading URL's in Parallel with Perl