in reply to Re^4: Parallel downloading under Win32?
in thread Parallel downloading under Win32?
Getting it to run enough threads to run at comparable speed to the wget method required 300 mb.
How many wget instances were you running?
I'd be really surprised if it is necessary to run 20 threads in order to saturate your bandwidth. Unless the server you are connecting to is severally restricting the throughput of individual connections. And when that happens--for example if the site is using thttpd or similar--unless the webmaster is very naive, they ensure that the throttling rates apply across all concurrent connections from any given ip.
Running 2 or 3 connections concurrently usually serves to maximise throughput. Beyond that, thread thrash tends to deteriorate throughput rather than increase it. Threads newbies tend to think: 'more is better', but the reality is, That is rarely the case.
Especially with tcp connections. TCP has been tuned over decades to utilise as much bandwidth as is available for each connection. Whilst using two concurrent connnections will usually allow the second to 'mop up' any bandwidth under-utilised by the first, unless you have more than one processor/core, a third thread will usually impact the performance of the first two through thread thrash. (Assuming unrestrictied and infinite bandwidth from the server.)
As a rule of thumb, I would suggest that you set $T (or $thread_count as you would have it :), to no more than 2 * NoOfCores (sorry $no_of_cores :).
Caveat: From the code you posted, you are pushing your entire url list onto the queue, prior to staring your threads. If your url list is relatively small--say < ne3--no harm done. But...if your url list is bigger than that, the I would highly recommend starting your threads first and including a call to yield() in your url enqueue loop.
Caveat 2: If your are seriously seeking to minimise memory usage, then you should consider starting your threads pool prior to loading (useing) the vast majority of whatever code or modules are needed by the main body of your application.
The reason for this advice, is that for good or bad, the original author(s) of threads decided that each spawned thread would inherit everything already loaded by the main thread at the point of thread creation(*). (eg. he/they decided to emulate the fork way of working!) By starting your worker threads early--remember that use is a compile-time enacted opcode--you can minimise the size of the primary thread and therefore, the size of every subsequently spawed thread.
(*) Yes. I know it is dumb, but you try convincing those that have the power to change things of that!
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^6: Parallel downloading under Win32?
by Xenofur (Monk) on Apr 30, 2009 at 09:30 UTC | |
by BrowserUk (Patriarch) on Apr 30, 2009 at 09:44 UTC | |
by Xenofur (Monk) on Apr 30, 2009 at 12:06 UTC | |
by BrowserUk (Patriarch) on May 01, 2009 at 03:23 UTC | |
by Xenofur (Monk) on May 02, 2009 at 17:51 UTC |