in reply to Re: Crawling with Parallel::ForkManager
in thread Crawling with Parallel::ForkManager

Thanks for writing. Well I try to access the webpages right after I stop (terminate, in this case) the program and not much later.

You are right, when I spawn 3 child processes (I have 4 right now), in that case I see much less error messages. But even if I reduce it to 2 parallel connections, I still see error messages !

I can't think of a way out.

  • Comment on Re^2: Crawling with Parallel::ForkManager

Replies are listed 'Best First'.
Re^3: Crawling with Parallel::ForkManager
by fullermd (Vicar) on Aug 07, 2009 at 22:44 UTC

    It really just depends on why the server is giving you the cold shoulder. I went with the most obvious; number of simultaneous connections. If that's the case, dropping to 1 (i.e., not parallel at all) would resolve it. But it may do rate-limiting, shoving you away after a given number of responses in a particular time period. It may be server load dependent. It may just be flat-out random.

    Likely, the only way you can find out for sure what's up is by talking to the server admin. The best solution code-wise is to be adaptive; if you start getting errors, slow down, if you get no errors for a while, speed up. But that's a lot of work to get right.