Okay. Here are a few of the causes of your slowdown:

  1. Your main (boss) thread has a tight loop (while( !$exit ){) which means it spends most ot it's timeslices, and 90%+ of your cpu thashing around waiting for the threads to send it some results--which they can't do because your main thread is consuming all the cpu!

    Inserting a breif pause (select undef,undef,undef, 0.1;) into your main thread loop drops the overall cpu usage from 99% to 3-4%.

  2. You are creating a new user agent (my $ua = LWP::UserAgent->new(timeout => 3);) for every request you make.

    The simple expedient of moving that outside the loop and re-using the same user agent for each request made by the thread speeds the processing and reduces the memory consumption/thrashing enormously.

    I think this was the main cause of your slowdown.

  3. Your calculation of the number of threads required doesn't make sense.

    If the load imposed by running 20 threads uses all your cpu, adding another 80 into the mix will not help. Your code will just spend more time swapping and less time processing.

    The trick is always, start with a few threads, check that you aren't leaking memory or thrashing the cpu to death, and then increase the number of threads until adding more doesn't result in any greater throughput.

With those few changes, I managed to process 2270 head requests in 97 seconds. And that is with my 40kb/s dial-up connection--using just 25 10* threads!

* Update: Once I made the number of threads a command line parameter, I find that I get no discernable increase in throughput once I move above 10 threads. Despite that using 10 threads uses barely 10% of my cpu, the limitations on throughput seems to be soley the limited bandwidth of my connection. If you have a faster connection, you may be able to increase throughput by using more threads, but don't go mad. Start with 10 and increase in small jumps.

[16:57:57.93] P:\test>418095 >nul Queued urls: 2270 Time:97 Done

By my calculations, that means I should allow a throughput of 85,000 urls an hour--which I think well exceeds your requirements. With a little optimisation, this could probably be speeded up considerably.

Note: This is done on Win32--I have no feel for what sort of results you will get under linux.

I would be most grateful to hear what sort of throuput you get with what number of threads on your system please?

Here the version of your code I used to get the above results:

#!/usr/bin/perl use strict; use threads; use threads::shared; use LWP::UserAgent; use HTTP::Request::Common; use Thread::Queue; $| = 1; my $thread_num : shared = 0; my $max_thread = 25; my $exit = 0; my $dump = 0; my $start_time = 0; my %tid : shared = (); my $task_q = Thread::Queue->new(); my $result_q = Thread::Queue->new(); my @urls = <DATA>; chomp @urls; $task_q->enqueue( @urls ); undef @urls; warn "Queued urls: ", $task_q->pending, "\n"; $start_time = time(); threads->new(\&thread_do) for (1..$max_thread); while ( $task_q->pending ) { select undef, undef, undef, 0.1; print $result_q->dequeue(), "\n" while $result_q->pending(); if ($dump++ > 100000) { # print "Dump\n"; dump_tid(\%tid); $dump = 0; } } sleep 3; ## Give the task threads time to finish up warn "\n\nTime:" . (time() - $start_time) . "\n"; warn "Done\n"; sub thread_do { threads->self->detach(); my $tid = threads->self->tid(); my $ua = LWP::UserAgent->new( timeout => 3 ); while ( $task_q->pending ) { my $url = $task_q->dequeue; my $res = $ua->request( HEAD $url ); $result_q->enqueue( "$tid; $url ::= " . $res->code() . ";" ); lock %tid; $tid{ $tid }++; } } sub dump_tid { # my $tid = shift; # open (DUMP, "> dump.txt"); # print DUMP "$_ = $tid->{$_}\n" foreach keys %$tid; # close DUMP; }

Examine what is said, not who speaks.
Silence betokens consent.
Love the truth but pardon error.

In reply to Re: Problem with ithreads by BrowserUk
in thread Problem with ithreads by 2NetFly

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.