in reply to Re: Combining LWP::Parallel::UserAgent with WWW::Mechanize
in thread Combining LWP::Parallel::UserAgent with WWW::Mechanize

Thanks Perrin.

With multiple processes, I have tracked down two basic approaches, apart from LWP::Parallel which you didn't like.

  1. ThreadQueue -- Re: What is the fastest way to download a bunch of web pages?-- (thanks BrowserUK)
  2. Parallel::ForkManager (Suggested by jasonk above, and also mentioned on the "fastest way to download" thread)

Do you think one way has any advantages over the other? Or are these ways essentially the same under the hood?

FWIW I'm on linux now (new job -- yay! now I get perl in its native habitat :) ), since this seems to be relevant when forking comes into play. (Forking works better on linux.)

Also, to give a bit more contetx, I'll be downloading potentially 10s of thousands of websites, but no more than 100 from any one particular domain.

  • Comment on Re^2: Combining LWP::Parallel::UserAgent with WWW::Mechanize

Replies are listed 'Best First'.
Re^3: Combining LWP::Parallel::UserAgent with WWW::Mechanize
by perrin (Chancellor) on Apr 21, 2006 at 15:06 UTC
    I don't use threads, but I do know that the memory consumption tends to be higher with threads than with an equivalent number of processes. I use Parallel::ForkManager, and it works well and reliably. Collecting the data can be more work than with threads though.