Ntav has asked for the wisdom of the Perl Monks concerning the following question:

I'm doing what I guess is a fairly common task with perl: retrieving a number of data sources from the web, parsing, extracting and analysing the data. My question relates to retrieval, I use something like this:
use LWP::Simple; #each page is actually processed in a subroutine but you get the idea $page1 = "http://www.first.com"; $page1 = get($page1); #rest of the pages in same format here $pageN = "http://www.last.com"; $pageN = get($pageN);
Now this has (at least) two problems which I need to solve: 1/ each of the pages is retrieved in turn whereas given the speed of the server the script is run on I want to get them all at once, so Q1: how do I implement multithreading here? 2/ if a page fails to respond I dont want the script to wait any more than N seconds before moving to the next, so Q2: how do I time a (sub)process and kill it after N seconds? Thanks for any help, Ntav

Replies are listed 'Best First'.
Re: retrieving multiple web documents
by wog (Curate) on Aug 30, 2001 at 06:09 UTC

      POE::Component::Client::HTTP looks interesting, too, although when I had a quick look at it the documentation wasn't too great so I never got any code working. POE looks like an interesting tool for dealing with a variety of parallel tasks in Perl, though.

      Ntav's code might benefit from using an array of URLs instead of a series of scalars. For example:

      use LWP::Simple; my @page = qw(http://www.first.com http://www.last.com); foreach (@page) { get($_); }

Re: retrieving multiple web documents
by Zaxo (Archbishop) on Aug 30, 2001 at 06:18 UTC

    Just recently, Parallel::ForkManager was reviewed here. Its pod includes a snippet for doing just what you want.

    After Compline,
    Zaxo