in reply to How to download html with threads?

With minimal changes to your existing code, a (untested) threaded solution might look like:

#!/usr/bin/perl -w use strict; use threads; use threads::shared; use LWP::UserAgent; use HTTP::Request; print "Working...\n"; my $ua = LWP::UserAgent->new; $ua->agent("Mozilla/4.0 (compatible; MSIE 5.0; Windows 98; DigExt)"); $ua->timeout(15); open URL_PLANETS, '<', "url_planets.txt" or die $!; my @urls = <URL_PLANETS>; close(URL_PLANETS); chomp @urls; open NAMES, '>>', 'planet_names.txt' or die $!; my $mutexStdout :shared; my $mutexFile :shared; my $running :shared = 0; foreach my $planet (@urls) { async { { lock $running; ++$running; } { lock $mutexStdout; print "Downloading: " , $planet , "\n" }; my $req = HTTP::Request->new(GET => $planet); my $response = $ua->request($req); my $content = $response->content(); lock $mutexFile; print NAMES $content =~ m[Rotations<i>(.*)</i>]m,"\n"; { lock $running; --$running; } }->detach; sleep 1 while $running > 10; } sleep 1 while $running; ## Let the last 10 finish. close(NAMES);

Updated: misc++


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

Replies are listed 'Best First'.
Re^2: How to download html with threads?
by misc (Friar) on Aug 01, 2007 at 11:42 UTC
    Just for completeness..
    I believe your code should end with
    sleep 1 while $running; close(NAMES);

    I'd always prefer a self written solution above an existing module, if it's not too complicated.
    You'll learn this way, besides using an existing module can sometimes be more expensive than write your own code,
    since you'll have to learn the api and possibly to deal with unexpected behaviour.

    I'm not sure about what you're going to do (30000 planets??), but if you'll have to fetch the data regularly it would perhaps be senseful to save the modification time of the web sites along with your data, and compare later just the modification time of the online pages with your locally stored data.