karthick has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks, I am working in a project which requires me to place 250 parallel HTTP request and get its response and do some action based on the response. I created 250 threads and let each thread handle one request and also the logic to do something based on the incoming response. Now my problem is my application has limitation of 250 request, which means at any point of time there should be only 250 threads in operation , let say if 10 out of those 250 threads have completed their operation and exited, I need to replace those 10 threads with new urls for new request to be made. In other words there should be 250 threads operating at any point of time.Is there anyway to keep in track of the number of thread in operation.And create corresponding number of new threads as and when threads exit , keeping the sum total of number threads in operation to be constant 250 at all times. please suggest me modification to the code.. Thanks in advance.. The code am currently using...
use strict; use LWP::UserAgent; use HTTP::Request; use threads; my $url="http://www.google.com"; my %hash; my $dthread; my %shash; open FH,">result.txt"; my $numworkers=1000; while($numworkers) { foreach my $dthread(1..250) { $hash{$dthread} = threads->create(\&requester,$url); } foreach my $dthread(1..250) { $hash{$dthread}->join(); $numworkers--; } } } sub requester() { my $url1=shift; my $request = HTTP::Request->new(GET => $url1); my $ua = LWP::UserAgent->new; print "Requesting Header....\n\n"; my $response = $ua->request($request); print "Requesting placed....\n\n"; my $content=$response->headers_as_string; print FH "$content\n\n"; print " .........thread completed............\n\n"; }

Replies are listed 'Best First'.
Re: Newbie question in threading
by Corion (Patriarch) on Apr 09, 2010 at 11:21 UTC

    See Re: How to create thread pool of ithreads for a good skeleton of how to use a pool of worker threads. I think that you'd be better served by not using threads but an asynchrous HTTP framework like AnyEvent::HTTP or POE to drive the communication, as multiple threads won't gain you much performance, as most of your processing will be limited by the throughput of the network connection between your machine and the the server(s).

    Also note that when hitting a host repeatedly, make sure to throttle your requests to pause at least as long as the last request took before you launch a new request, so the target machine does not get overloaded.

    As a last note, scraping Google is against Googles terms of use, so you might or might not violate local law.

      thanks corion..!! The google url in the code is not for scrapping .I honestly respect terms of use of google.And am going through the alternatives u suggested. Thanks Again..!!
Re: Newbie question in threading
by BrowserUk (Patriarch) on Apr 09, 2010 at 11:36 UTC

    In addition to Corion's notes about scrapping Google and hitting servers with 250 concurrent requests. Doing a GET, if all you need are the headers is horribly wasteful of your targets server resources and bandwidth as well as your own bandwidth. A HEAD request should be all you need.

    It's also wasteful to start a new thread just to perform 1 simple operation each time.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.