in reply to Combining LWP::Parallel::UserAgent with WWW::Mechanize

You should consider just forking and using Mechanize from multiple processes instead. My experience with LWP::Parallel::UserAgent has been that it performs badly compared to a multi-process approach.
  • Comment on Re: Combining LWP::Parallel::UserAgent with WWW::Mechanize

Replies are listed 'Best First'.
Re^2: Combining LWP::Parallel::UserAgent with WWW::Mechanize
by tphyahoo (Vicar) on Apr 21, 2006 at 13:39 UTC
    Thanks Perrin.

    With multiple processes, I have tracked down two basic approaches, apart from LWP::Parallel which you didn't like.

    1. ThreadQueue -- Re: What is the fastest way to download a bunch of web pages?-- (thanks BrowserUK)
    2. Parallel::ForkManager (Suggested by jasonk above, and also mentioned on the "fastest way to download" thread)

    Do you think one way has any advantages over the other? Or are these ways essentially the same under the hood?

    FWIW I'm on linux now (new job -- yay! now I get perl in its native habitat :) ), since this seems to be relevant when forking comes into play. (Forking works better on linux.)

    Also, to give a bit more contetx, I'll be downloading potentially 10s of thousands of websites, but no more than 100 from any one particular domain.

      I don't use threads, but I do know that the memory consumption tends to be higher with threads than with an equivalent number of processes. I use Parallel::ForkManager, and it works well and reliably. Collecting the data can be more work than with threads though.