Re^5: Problem with ithreads

I have 70 000 URLs (almost all on different servers) to check every day and I have to check them as quickly as possible (max 60 min). Depending on sever on witch document is located it take from 1 to 4 seconds to make head request. So, if I want make 70 000 / 60 / 60 = 20 requests per second, I need at least 50 threads working parallel. CPU load isn’t very high because almost all the time thread is waiting for response from the server.

I’ll rewrite code a bit.

sub thread_do
{
    # while not all urls are checked
    while (!$DONE) {
        # get new url from boss thread
        my $url = $task_q->dequeue();
        # check url
        my $ua = LWP::UserAgent->new(timeout => 3);
        my $res = $ua->request(HEAD $url);
        # return result to the boss thread
        $result_q->enqueue(
            "$tid;$url;" 
            . $res->code() 
            . ";" 
            . $res->message() 
            . ";"
        );
    }
[download]

$DONE is a shared variable and the boss thread set it true when all urls are checked.

I used prethreading that means that I create number of threads and each thread process the same task number of times. In my example each thread makes head request until all urls are checked.

Comment on Re^5: Problem with ithreads Download Code