saranrsm has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks, I need to parallelize the process of finding positions of various patterns which has been collected and stored into a hash (refer the script which i have attached). if I dont parallelized the process of find patterns stored in a hash then all patterns has to stay in the queue until the preceding pattern is completed. So here I dont know whether to use threads or forks for achieving parallelization.

I have attached my script, where I have tried to implement perl ithreads to accomplish the parallelization (at line 25, inside the foreach loop). I dont think that the threads I have implemented works properly, because there's no visible improvement in the script as used without threads.

here in this paragraph, i have tried to explain the objective of my work. i get in a string and break it down into a length specified substrings (which may vary) and find the positions of those substrings.

use threads ; use feature "say"; use warnings; my $input_file = "text"; #Contains a single lined strin +g without any newline character open my $hd, "<", $input_file or die "Couldn't open '$input_file' - $! +"; my $text=<$hd>; chomp($text); my $window=5; #Length of the window my $i=0; LOOP:seek $hd,$i,0; #Here seek and read give us the + first $window lengthed substing runned over loop read ($hd,my $substring,$window); my %hash; if(length($substring)==$window) { $hash{$substring}=$i; #$substring pushed into a has +h, here I have avoided array has there may be repeated substrings in +the $text $i++; goto LOOP; } my $count = keys %hash; my $j=0; foreach my $pattern (keys %hash) { if($j<=$count) { my $thr="thread".$j; $$thr=threads->create(\&position,$text,$pattern); $j++; } } for($ii=0;$ii<=$count;$ii++) { $thr="thread".$ii; print $thr,"\n"; $$thr->join(); } sub position { my ($text,$pattern) = @_ ; my $offset = 0; print "\n$pattern ($window)\n"; print '~' x $window,"\n"; my $pos=index $text,$pattern,$offset; while ($pos != -1) { print $pos+1," to ",$pos+$window,"\n"; $offset = $pos + 99; $pos = index($text, $pattern, $offset); } }

Replies are listed 'Best First'.
Re: Trouble implementing threads
by zentara (Cardinal) on Oct 20, 2011 at 15:13 UTC
    Here is your basic thread joining loop. Remove your join from your creation sub, and put it in a waiting sub at the end of your main script.
    for($ii=0;$ii<=$count;$ii++) { $thr="thread".$ii; print $thr,"\n"; # $$thr->join(); # don't join here } make a loop that waits for each thread as they finish # JOIN ALL THREADS # untested, but close :-) my @returns; while(1) foreach my $thread (threads->list) { if ($thread->is_joinable() ) { push @returns, $thread->join; } if( scalar (threads->list) == 0 ) { last} } print "@returns\n"; __END__

    I'm not really a human, but I play one on earth.
    Old Perl Programmer Haiku ................... flash japh

      Dear monks I have modified as per zentara's advice but now it goes into a endless loop and get's killed ultimately. can some one help me in this.

      use threads ; use feature "say"; #use strict; use warnings; `tr -d '\n' </home/guru/Desktop/sequence.fasta >modified_human_chr_1.t +xt`; my $input_file = "modified_human_chr_1.txt"; open my $hd, "<", $input_file or die "Couldn't open '$input_file' - $! +"; my $text=<$hd>; chomp($text); my $window=5; my $i=0; LOOP:seek $hd,$i,0; read ($hd,my $substring,$window); my %hash; if(length($substring)==$window) { $hash{$substring}=$i; $i++; goto LOOP; } my $count = keys %hash; my $j=0; foreach my $pattern (keys %hash) { if($j<=$count) { my $thr="thread".$j; $$thr=threads->create(\&hunter,$text,$pattern); $j++; } } my @returns; foreach $thread (threads->list) { if ($thread->is_joinable() ) { push @returns, $thread->join; } if( scalar (threads->list) == 0 ) {last;} } print "@returns\n"; sub hunter { my ($text,$pattern) = @_ ; my $offset = 0; print "\n$pattern ($window)\n"; print '~' x $window,"\n"; my $pos=index $text,$pattern,$offset; while ($pos != -1) { print $pos+1," to ",$pos+$window,"\n"; $offset = $pos + 99; $pos = index($text, $pattern, $offset); } }
        To be honest, I can't understand what you are doing in your code, with the goto LOOP and the seek. Your first step should be to see how many threads are being created, you may have a thread bomb in your code. How many threads are reportedly created by the printout of $j, and how much memory are you consuming. Try watching the script with top, and see how much memory you use.

        You should strip down your code, to the point where you are just creating and joining threads which do nothing. When you get that to work, start adding your math routines, and see where it starts to break down.


        I'm not really a human, but I play one on earth.
        Old Perl Programmer Haiku ................... flash japh
Re: Trouble implementing threads
by SuicideJunkie (Vicar) on Oct 20, 2011 at 13:31 UTC
    $$thr=threads->create(\&position,$text,$pattern); $$thr->join();

    That looks like you're:

    1. Starting a new thread
    2. waiting for it to end
      • waiting for it to initialize
      • waiting for it to do the work
      • waiting for it to clean up
    3. Getting the result.
    4. Loop back to 1

    That kind of thing is going to slow down your program with all the thread overhead. You need to create a small set of worker threads, and distribute the work between them.

Re: Trouble implementing threads
by zentara (Cardinal) on Oct 20, 2011 at 14:20 UTC
    I dont think that the threads I have implemented works properly, because there's no visible improvement in the script as used without threads.

    Do you have a multi-core computer setup to run as multiple processors? If not, if you essentially have a single cpu, you can only process 1 thread at a time, and there will be no speed improvement over just running your math sequentially in a single subroutine. Its a common misconception that threads speed up parallelized routines on single cpu machines. In some cases they can help, like if there are multiple socket calls out to the internet, but in general text processing or number crunching no gain is made. You probably actually slow things down by threading on a single cpu computer, due to the overhead that threads impose.

    Threads still can be useful on single cpu machines, but it is not for speed improvements; it is for things like inter-process communication and making routines non-blocking.


    I'm not really a human, but I play one on earth.
    Old Perl Programmer Haiku ................... flash japh
      I am having a 64 bit intel core 2 quad processor with 4 GB RAM, wont that be enough??