in reply to Crawling with Parallel::ForkManager

I assume you are using Mechanize to retrieve the pdf files. Have you tried to retrieve them without using the Parallel module?
  • Comment on Re: Crawling with Parallel::ForkManager

Replies are listed 'Best First'.
Re^2: Crawling with Parallel::ForkManager
by listanand (Sexton) on Aug 07, 2009 at 22:34 UTC
    Thanks for writing.

    I am using LWP::Simple (mirror method) to retrieve the PDFs. Without using the Parallel, everything works fine.

      Just a guess here...

      Have you tried to download the PDF using the $mech connection you are already using? Say using:

      $mech->get($url_to_pdf); $mech->save_content( $filename );

      Maybe this is a cookie issue. I believe that $mech will accept cookies by default. This might mean that using a separate mirror process causes a different connection to take place and the web server maybe does not allow a direct connection from that page without a cookie.

      It might work for you in the browser since your browser would already have a cookie.