jey has asked for the wisdom of the Perl Monks concerning the following question:

Dear Fellow monks,

I am trying to build a Pool of threads, using the Thread::Pool module.
I don't like to waste peoples' time, so there is already a similar node here (Building a thread pool). I've looked at it. It helps in some way.

Now using the example in the module's doc. I tried that code:
#!/usr/bin/perl use strict; use warnings; use Thread::Pool; my %resolved : shared; my $pool = Thread::Pool->new( { workers => 3, do => \&do, monitor => \&monitor, } ); $pool->job($_) for 1 .. 100; $pool->shutdown; print $_, "\t", $resolved{$_}, $/ for sort { $b <=> $a } keys %resolve +d; sub do { $resolved{$_[0]} = $_[0] * $_[0]; $_[0]; } sub monitor {return 1}
And it's great it works...
The only problem is whenever, I go away from trivial examples, it does not work anymore.
So here is another piece of code that DOESN'T work. I'd be grateful to anybody who can tell me why.
#!/usr/bin/perl use strict; use warnings; use Thread::Pool; my %resolved : shared; my $pool = Thread::Pool->new( { workers => 3, do => \&do, monitor => \&monitor, } ); $pool->job($_) for 1 .. 100; $pool->shutdown; print join("\t",($_,$resolved{$_}->{2},$resolved{$_}->{3},$resolved{$_ +}->{4})), $/ for sort { $b <=> $a } keys %resolved; sub do { my %sh; $sh{2} = $_[0] * $_[0]; $sh{3} = $_[0] * $_[0] * $_[0]; $sh{4} = $_[0] * $_[0] * $_[0] * $_[0]; $resolved{$_[0]} = \%sh; $_[0]; } sub monitor {return 1}
I am running FC5, Perl 5.8.8 on a dual Xeon Machine, the error message I get is:
Cannot find result for streaming job 1 at /usr/lib/perl5/site_perl/5.8 +.8/Thread/Pool.pm (loaded on demand from offset 20056 for 2428 bytes) + line 769.
Thanks in advance,

Update :
I think I haven't been clear enough about what I would like.
I'd like to build a work crew model of threads, where I could submit 1000s of jobs a the entry, have, e.g., 4 threads running at the same time and an output queue that returns the results whenever they are ready.
Hope that is helpful for those who might have some insights.
Thanks.

Update 2: (mainly for browserUK)
Thanks for your genuine help
I work at a University, so nothing is proprietary
Basically, I want to read in a file that contains some data and treat this data as follow:
Since the initial file contains, e.g., 500 different piece of data to be treated and that every treatment takes ~5min, i'd like to be able to use the 4 processors on the opteron machines to speed the whole process by, e.g., 4.
Now I am not super familiar with threads and all that, but I can make threading more or less work. What I have a problem with is how to stack 1000 jobs in a queue, and have 3 or 4 threads running at a time, whenever one of them is finished, it should return the result and another thread should start the next job waiting on the queue, etc., until all the jobs are completed...
Thanks again
--
jey

Replies are listed 'Best First'.
Re: Building a Pool of threads
by renodino (Curate) on Aug 23, 2006 at 23:17 UTC
    While I can't claim its directly related, have you:

    1. installed the latest version of threads and threads::shared ?
    2. adjusted your default thread stack size ?
    Item 1 fixes quite a few bugs, and cleans up some issues (kudos to jdhedden for his valiant efforts). Item 2 may help improve your ability to run 100 threads, as the stock Perl on FC/RedHat allocates 10Meg stack space for each thread, which is very likely way more than you need (see Use more threads. thread for details).

    Update:
    My bad. I now see you're only using a few threads, so the stack shouldn't be an issue...but it might be worthwhile to install the latest version anyway.

Re: Building a Pool of threads
by BrowserUk (Patriarch) on Aug 23, 2006 at 22:47 UTC

    Are you set on using Thread::Pool?


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Well it seems like the easier way. But I've seen your code on Building a thread pool.
      Do you think it might work better?
      I don't really care about locking, and could just submit all my jobs and wait for all of them to be finished..

        Well. I've never managed to get Threads::Pool to work on my system. I just tried to reinstall the whole dependency chain from scratch and got test failures all the way up the chain from load module onward ending with this at the top of the chain:

        t\Pool06....dubious Test returned status 1 (wstat 256, 0x100) DIED. FAILED tests 1-21 Failed 21/21 tests, 0.00% okay Failed Test Stat Wstat Total Fail Failed List of Failed ---------------------------------------------------------------------- +--------- t\Pool01.t 1 256 42 83 197.62% 1-42 t\Pool02.t 1 256 337 673 199.70% 1-337 t\Pool03.t 1 256 403 803 199.26% 1 3-403 t\Pool04.t 1 256 202 403 199.50% 1-202 t\Pool05.t 1 256 37 73 197.30% 1-37 t\Pool06.t 1 256 21 41 195.24% 1-21 Failed 6/6 test scripts, 0.00% okay. 1041/1042 subtests failed, 0.10% +okay. NMAKE : fatal error U1077: 'C:\Perl\bin\perl.exe' : return code '0x2' Stop.

        With the greatest of respect to the author of these modules, I find them over-complex for what they attempt to do.

        The code I posted in that thread has never been developed further because I found it impossible to write a single, simple & consistent interface that would cater for the many possible scenarios in which a "pool of threads" might be used. I'm also of the opinion that the more general that you make the interface to a library, the less usable it becomes, and the less effective it is for any given application. To that end, I prefer to use the basic threads/threads::shared (and often Thread::Queue) directly to construct the infrastructure I need for each particular application. I have yet to see sufficient pattern across a range of applications for thread pools to warrant the abstraction of a common part of it out into a separate library.

        I don't really care about locking, ...

        You'll have to explain wht you mean by that.

        Why do you not care about locking? Are you saying that each of your threads will only use independent data and so there is no need for locking?

        ... and could just submit all my jobs and wait for all of them to be finished..

        That statement suggests that you should not be considering threads at all.

        On a dual processor machine, the probability is that if you ran all your code serially, it would finish more quickly than if you started a 100 threads simultaneously. Although the serial code would only utilise one cpu (at a given time), the OS would not be having to task swap each cpu between 50 threads.

        The code you posted was obviously only an example, as it does nothing that requires or benefits from threads. If you were to post a synopsis of the actual application, then it would be possible to advise on strategies you might consider.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.