I have 70 000 URLs (almost all on different servers) to check every day and I have to check them as quickly as possible (max 60 min). Depending on sever on witch document is located it take from 1 to 4 seconds to make head request. So, if I want make 70 000 / 60 / 60 = 20 requests per second, I need at least 50 threads working parallel. CPU load isn’t very high because almost all the time thread is waiting for response from the server.
I’ll rewrite code a bit.
sub thread_do
{
# while not all urls are checked
while (!$DONE) {
# get new url from boss thread
my $url = $task_q->dequeue();
# check url
my $ua = LWP::UserAgent->new(timeout => 3);
my $res = $ua->request(HEAD $url);
# return result to the boss thread
$result_q->enqueue(
"$tid;$url;"
. $res->code()
. ";"
. $res->message()
. ";"
);
}
$DONE is a shared variable and the boss thread set it true when all urls are checked.
I used prethreading that means that I create number of threads and each thread process the same task number of times. In my example each thread makes head request until all urls are checked.