I am trying to do using Parallel::ForkManager to do some crawling. I am new to Perl and am having some trouble with the crawling process. Here's the relevant piece of code
my $manager = Parallel::ForkManager->new(4); for(@identifier){ #List of URLs to be crawled $manager->start and next; $mech->get($_); die $mech->response->status_line unless $mech->success; my $html = $mech->content; ##some processing of HTML to extract the location of PDF file# +# mirror($url,"/home/username/data/$file_name.pdf"); $manager->finish; sleep(2) } $manager->wait_all_children;
Now what happens when I run this program is that some PDF files are being retrieved and I notice an error message (" Error GETing URL_NAME: Service Temporarily Unavailable at crawl.pl line 138"). But in fact, the URL_NAME is accessible when I use a browser to view it. There are plenty of URLs not being crawled because of this.
What am I missing?
Thanks in advance.
In reply to Crawling with Parallel::ForkManager by listanand
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |