I have 70 000 URLs (almost all on different servers) to check every day and I have to check them as quickly as possible (max 60 min). Depending on sever on witch document is located it take from 1 to 4 seconds to make head request. So, if I want make 70 000 / 60 / 60 = 20 requests per second, I need at least 50 threads working parallel. CPU load isn’t very high because almost all the time thread is waiting for response from the server.
I’ll rewrite code a bit.
sub thread_do
{
# while not all urls are checked
while (!$DONE) {
# get new url from boss thread
my $url = $task_q->dequeue();
# check url
my $ua = LWP::UserAgent->new(timeout => 3);
my $res = $ua->request(HEAD $url);
# return result to the boss thread
$result_q->enqueue(
"$tid;$url;"
. $res->code()
. ";"
. $res->message()
. ";"
);
}
$DONE is a shared variable and the boss thread set it true when all urls are checked.
I used prethreading that means that I create number of threads and each thread process the same task number of times. In my example each thread makes head request until all urls are checked.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.