Thanks. I took a closer look at LWP::Parallel and it may come in handy later¹. From the documentation, I see how to fetch batches of links and wonder if there is any way to parallelize the processing of the results in the same move. That way fetch+process runs continuously, rather than fetch in parallel wait and then process in parallel.
I'll be digging into the options mentioned in the other threads, too.
#!/usr/bin/perl
use LWP::Parallel::UserAgent;
use strict;
use warnings;
my @feeds = (
'http://localhost/feed1.xml', # rss
'http://localhost/feed2.xml', # atom
'http://localhost/foo', # 404
);
my $requests = &prepare_requests(@feeds);
my $entries = &fetch_feeds($requests);
foreach my $k (keys %$entries) {
my $res = $entries->{$k}->response;
print "Answer for '",$res->request->url,"' was \t",
$res->code,": ", $res->message,"\n";
# $res->content,"\n";
}
exit(0);
sub prepare_requests {
my (@feeds) = (@_);
my $requests;
foreach my $url (@feeds) {
push(@$requests, HTTP::Request->new('GET', $url));
}
return($requests);
}
sub fetch_feeds {
my ($requests) = (@_);
my $pua = LWP::Parallel::UserAgent->new();
$pua->in_order (0); # handle requests in order of registration
$pua->duplicates(1); # ignore duplicates
$pua->timeout (9); # in seconds
$pua->redirect (1); # follow redirects
$pua->max_hosts (3); # max locations accessed in parallel
foreach my $req (@$requests) {
print "Registering '".$req->url."'\n";
if (my $res=$pua->register($req, \&handle_answer, 8192, 1)) {
# print STDERR $res->error_as_HTML;
print $res->error_as_HTML;
} else {
print qq(ok\n);
}
}
my $entries = $pua->wait();
return($entries);
}
sub handle_answer {
my($content, $response, $protocol, $entry) = @_;
if (length($content)) {
$response->add_content($content);
} else {
1;
}
return(undef);
}
¹ That's the thing about CPAN, there are so many useful modules with great accompanying documentation that discovery can be a challenge. So I am very appreciative of everyone's input here.
|