tekio has asked for the wisdom of the Perl Monks concerning the following question:

I'm really having a difficult time finding a good example of how to do this. All the posts I find using Google, the poster has asked the question and just been told the they, and I are threading the application is incorrect.

But nobody has given an example of a proper way to do this. Also, most of the PERL threading tutorials I find are just printing out something from a thread, in a "hello world" type example.

As a random project, I have created an application that reads a list of words from a text file. It then uses LWP or IO::Socket to check and see if they are a valid domain on the Interwebz. (yes, I could use whois, but that is not the point).

After about 5 minutes, all my swap space is eaten up and the o/s comes to a screeching halt. I understand why. All my threads are taking up resources not being freed. I am doing something wrong.

What I want to do is read the test file; use a variable to define how many threads can be active at one time. Get back the memory from those threads, then do $n more threads at a time.

Below is my failed attempt: (i understand the code below probably sucks in all sorts of ways. Sorry if anyone is offended)

#!/usr/bin/perl use LWP::UserAgent; use threads; use threads::shared; our $totalCount :shared; $totalCount = 1; $count = 1; $k=1; $thrCount = 5; $timeout = 5; $start = localtime(); for($i=0; $i<=$#ARGV; $i++) { if($ARGV[$i] eq "\-t"){ $thrCount = $ARGV[$i+1]; } elsif($ARGV[$i] eq "\-i") { $infile = $ARGV[$i+ 1]; } } open(CONTENTS, "<$infile) or die("Could not open file!"); while($input = <CONTENTS>) { chomp $input; print "trying..... " . $input . "\n"; print " Total Count: " . $totalCount . " Count: " . $count . " +\n"; $thr = threads->new(\&tryHTTP,$input); $thr->detach(); $count++; $k++; if($k % $thrCount == 0) { sleep($timeout); $k=1; } } close(CONTENTS); $end = localtime(); print $start . "\n"; print $end . "\n"; print $totalCount . "/ " . $count . "\n"; sub tryHTTP { $ua = LWP::UserAgent->new; $ua->timeout((10)); $url = "http://www." . $_[0] . ".com"; $response = $ua->get($url); if($response->is_success) { $totalCount++; $response = ""; } }
P.S. Merry Christmas!

Replies are listed 'Best First'.
Re: Proper way to thread this in PERL.
by jethro (Monsignor) on Dec 25, 2013 at 20:33 UTC

    I'm no threads expert but one thing I see problematic in your code is that you use sleep as a crude delimiter of threads in the hope that 5 threads per 5 seconds is not too much. Much better would be to check for number of threads with scalar(threads->list()) and sleep as long as this number is at maximum. Only when that number goes down another thread should be generated

    Something like this:

    while (threads->list()<$threadsmax) { sleep(1); # or threads->yield; but I'm not sure what happens if you +call this in the main thread } print ".";

    I added the print for debugging purposes. If you see that no new "." get printed after a while it would mean no threads finish.

      Thank you! I will give that a try!
Re: Proper way to thread this in PERL.
by Preceptor (Deacon) on Dec 25, 2013 at 23:34 UTC

    I would strongly suggest avoiding spawning new threads in a 'while' loop. That just causes you grief. Each time you 'create' a thread, it copies your program state, which becomes a particular problem when you're loading lots of modules. Instead, I would advocate a 'worker thread' approach, and use Thread::Queue to 'feed' them.

    my $url_q = Thread::Queue -> new(); sub http_fetch_thread { my $ua = LWP::UserAgent -> new(); $ua -> timeout ( 10 ); while ( my $item = $url_q -> dequeue() ) { my $url = "http://www." . $item . ".com"; <... fetch stuff ... > } }

    Spawn a defined number of these threads (you look like you're trying to keep to 5?) and then just feed the contents of your file, into '$url_q'.

    for ( 1..$thrCount ) { threads -> create ( \&http_fetch_thread ); } open ( my $contents, "<", $contents_file_name ) or die $!; $url_q -> enqueue ( <$contents> ); $url_q -> end; close ( $contents ); #wait for completion foreach my $thr ( threads -> list() ) { $thr -> join(); #not capturing return code, we started in a void con +text. } print "At program completion, total count was ", $totalCount,"\n";

    So rather than starting a new thread every line of your file, which is instantiating a new 'useragent' object, you'll create a number equal to the number of threads defined - and then run through the list as fast as they can, and probably won't chew up your memory anything like as badly (and because you're not creating/destroying useragents and threads, you'll probably find it runs a lot faster too).

    If you want a running total of 'totalCount' you can either print it from within the thread, or instead of doing the 'foreach/join' loop, do a 'while' loop:

    while ( threads -> list ) { foreach my $thread ( threads -> list ( threads::joinable ) ) { $thread -> join(); } print $totalCount,"\n"; sleep 5; }

    Edit: More generally I'd suggest:

    • 3 argument 'opens' are nicer, especially when threading. (When is 'CONTENT' in scope?)
    • Detaching a thread can mean your program completing without the thread finishing. That can create anomalous results, so I'd suggest avoiding it generally
    • Parsing '@ARGV' by hand is a good way to introduce bugs. Look at GetOpt::Std for anything more than very trivial cases.
      These are excellent replies! Thank you guys so much! :)
Re: Proper way to thread this in PERL. (mech lwp asynchronous)
by Anonymous Monk on Dec 25, 2013 at 23:32 UTC
Re: Proper way to thread this in PERL.
by locked_user sundialsvc4 (Abbot) on Dec 26, 2013 at 03:06 UTC

    There’s definitely a “right way” and a “wrong way” to use threading/processes, IMHO, no matter what programming language(s) you are using to do the job.   A thread or process is a worker, not a unit of work.   The best example of concurrency in action is ... a fast-food restaurant.