Thanks,
OK. So here's what I've got so far. I've got this thing doing asynchronous HEADs. I'm allowing 10 HEAD requests to be fired off at a time. But, here's the but... it doesn't really seem to be working any faster than an LWP based version forked into 10 processes.
Perhaps, you were right Corion. Maybe this is not the kind of optimisation my crawler needs. Any other ideas?
#!/usr/bin/perl -w
use strict;
use warnings;
use Data::Dumper;
use AnyEvent::HTTP;
my ($domain,$START_TIME,$MAX_QUERIES,$MAX_QUEUE,$time,$done,$heads);
my (@condvars);
$START_TIME = time;
$MAX_QUERIES = 10;
$MAX_QUEUE = 10;
$heads = 0;
while (1)
{
# send dns packets
for my $i (1..$MAX_QUERIES)
{
$domain = <>;
# clean off newline
chomp $domain;
my $http_url = "http://www.".$domain;
my $condvar = AnyEvent->condvar;
push @condvars, $condvar;
http_request HEAD => $http_url, sub
{
#warn Dumper @_;
$condvar->send;
};
}
while (my $condvar = pop @condvars)
{
$condvar->recv;
}
$done += $MAX_QUERIES;
$time = time - $START_TIME;
print "Tried $done domains, $heads headers found in $time seconds.\n
+";
}
|