Getting it to run enough threads to run at comparable speed to the wget method required 300 mb.

How many wget instances were you running?

I'd be really surprised if it is necessary to run 20 threads in order to saturate your bandwidth. Unless the server you are connecting to is severally restricting the throughput of individual connections. And when that happens--for example if the site is using thttpd or similar--unless the webmaster is very naive, they ensure that the throttling rates apply across all concurrent connections from any given ip.

Running 2 or 3 connections concurrently usually serves to maximise throughput. Beyond that, thread thrash tends to deteriorate throughput rather than increase it. Threads newbies tend to think: 'more is better', but the reality is, That is rarely the case.

Especially with tcp connections. TCP has been tuned over decades to utilise as much bandwidth as is available for each connection. Whilst using two concurrent connnections will usually allow the second to 'mop up' any bandwidth under-utilised by the first, unless you have more than one processor/core, a third thread will usually impact the performance of the first two through thread thrash. (Assuming unrestrictied and infinite bandwidth from the server.)

As a rule of thumb, I would suggest that you set $T (or $thread_count as you would have it :), to no more than 2 * NoOfCores (sorry $no_of_cores :).


Caveat: From the code you posted, you are pushing your entire url list onto the queue, prior to staring your threads. If your url list is relatively small--say < ne3--no harm done. But...if your url list is bigger than that, the I would highly recommend starting your threads first and including a call to yield() in your url enqueue loop.

Caveat 2: If your are seriously seeking to minimise memory usage, then you should consider starting your threads pool prior to loading (useing) the vast majority of whatever code or modules are needed by the main body of your application.

The reason for this advice, is that for good or bad, the original author(s) of threads decided that each spawned thread would inherit everything already loaded by the main thread at the point of thread creation(*). (eg. he/they decided to emulate the fork way of working!) By starting your worker threads early--remember that use is a compile-time enacted opcode--you can minimise the size of the primary thread and therefore, the size of every subsequently spawed thread.

(*) Yes. I know it is dumb, but you try convincing those that have the power to change things of that!


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

In reply to Re^5: Parallel downloading under Win32? by BrowserUk
in thread Parallel downloading under Win32? by Xenofur

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.