aravind_v21 has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks,
I have question reg. usage of threads for a research problem of mine. I use perl 5.8.0 in RHEL.
I have a module which takes in folder names as arguments. I process these folders consecutively right now using a loop. Since the processing of each folder is independent of each other, I want to process all the folders simultaneously. I think threads should do the trick. But I need all the threads to finish their operations before I can continue as I would need the output of all the threads at the same time. For example,
use strict; use diagnostics; use threads; my @array_of_threads; my $i =0; my $string; sub first{ print "called first"; &second();} sub second { ## creating 5 threads for ($i ; $i<5;$i++) { $array_of_threads[$i]=threads->new(\&sub1,"$i"); print "End of thread ".int($i+1)." creation\n"; } &complete(); } sub complete{ print "The threads have completed the execution\n";} sub sub1{ my $j=$_[0]; print "Thread $j has entered sub1\n"; &sub2($j);} sub sub2{ my $k= shift; ## loop for thread 3 to show that time for processing the threads +maybe different if($k==3) { for (my $in=0;$in<500;$in++){ print "Thread $k has entered sub2 --- $in\n"; }} } &first();
The above is a model of my code. The first sub is called which in turns call the second sub(the parent thread). The second sub in turn creates 5 other child threads. I call sub1 and then sub2 for each thread.
After all the threads have been created, the complete sub routine is called. I need all the threads to have completed their processing(parallely) before the subroutine complete is called. How can I make the complete subroutine to be executed only after all the threads have completed their processing?
If I have to use join, then the threads are consecutively processed rather than parallelly. If detach is used the complete sub routine is executed before the threads have finished. So how do I do it? Please advise..
Thanks in advance.. Perl really rocks..

Replies are listed 'Best First'.
Re: how to use threads for this problem?
by BrowserUk (Patriarch) on Apr 26, 2007 at 07:30 UTC
    I use perl 5.8.0

    First. If you are serious about using threads, get a later version of perl. At least 5.8.6. Anything earlier than this has too many bugs to be worth the effort of even attempting to use.

    To be clear. Attempting to do anything with threads, even for fun, with 5.8.0 is a pointless exercise in total frustration and doomed to fail.

    How can I make the complete subroutine to be executed only after all the threads have completed their processing? If I have to use join, then the threads are consecutively processed rather than parallelly.

    You are confused. The following modification to your second() sub will ensure that complete() does not get run until all the threads in @array_of_threads have terminated.

    sub second { ## creating 5 threads for ($i ; $i<5;$i++) { $array_of_threads[$i]=threads->new(\&sub1,"$i"); print "End of thread ".int($i+1)." creation\n"; } $_->join for @array_of_threads; ## Add this &complete(); }

    Where and how were you trying to use join such that you got serialised results?

    ps. You don't really use variable names like @array_of_threads do you?

    pps. Why use for ($i ; $i<5;$i++) { instead of for( 1 .. 5 ) {?


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: how to use threads for this problem?
by quester (Vicar) on Apr 26, 2007 at 07:42 UTC
    The threads will be processed in parallel if you start all the threads, and then in a separate loop join with all of them.

    I changed the tasks to sleep for variable lengths of time (with rand and Time::HiRes::sleep) to make the output a bit clearer. I also removed the ampersands from subroutine calls like &complete(). One other change is that the way you are doing loops isn't initializing the counter correctly, "for ($i ; $i<5:$i++)" should be "for ($i=0; $i<5; i++)", or even better "for $i (0..4)".

    Try the following:

    use strict; use diagnostics; use threads; use Time::HiRes; my @array_of_threads; my $i = 0; my $string; sub first { print "called first\n"; second(); } sub second { ## creating 5 threads for ( $i = 0 ; $i < 5 ; $i++ ) { $array_of_threads[$i] = threads->new( \&sub1, "$i" ); print "End of thread " . int( $i + 1 ) . " creation\n"; } ## waiting for all 5 threads to complete for ( $i = 0 ; $i < 5 ; $i++ ) { print "Waiting for thread $i.\n"; $array_of_threads[$i]->join(); print "Joined thread $i.\n"; } complete(); } sub complete { print "The threads have completed execution\n"; } sub sub1 { my $j = $_[0]; print "Thread $j has entered sub1\n"; sub2($j); } sub sub2 { my $k = shift; my $n = 10 * rand; print "Thread $k is sleeping for $n seconds.\n"; Time::HiRes::sleep $n; print "Thread $k done.\n"; } first();
Re: how to use threads for this problem?
by BrowserUk (Patriarch) on Apr 26, 2007 at 08:15 UTC

    A simple demonstration of processing subdirs in parallel. This just accumulates the number of bytes used by the files in the list of (relative) directories supplied on the command line, but it may give you a start in what you are trying to do.

    #! perl -slw use strict; use threads; sub worker { my( $dir ) = @_; my $total = 0; opendir my $dh, $dir or die "$dir : $!"; while( my $file = readdir $dh ) { $total += -s "$dir/$file"; } closedir $dh; return $total; } my @threads; for my $dir ( @ARGV ) { if( -d $dir ) { push @threads, threads->new( \&worker, $dir ); } } my $total; $total += $_->join for @threads; print "Total bytes in files in directories [@ARGV] is: ", $total; __END__ C:\test>612156 3 4 5 data Total bytes in files in directories [3 4 5 data] is: 654203874

    You don't say what information you are retrieving from your processing of the directories that you need for subsequent processing, but if it is much more than one or two values, it might be better to return them to the main thread via a queue or other shared datastructure rather than via return/join, but one item, or a small list of items, is okay done this way.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.