traceyfreitas has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,

I'm hoping someone has experience with threads::shared and Thread::Pool. I have a large amount of data that I "load balance" across the available $numCPUs (user-defined parameter) so that if I have:

(i) 10 hrefs in an array; and

(ii) 3 cores;

Core 1 gets: 3 hrefs

Core 2 gets: 3 hrefs

Core 3 gets: 4 hrefs

and they process away. I call:

$_->join() foreach (@threads);

to get the results. The problem with this approach, I've found, is that if the complexity of the hrefs, say, in Core 1, is very low that Core 1 completes its jobs much sooner than the other 2 cores, I have a core waiting around doing nothing until the other two are done. A potential solution to this (besides attempting to estimate the complexity of the hrefs and load balancing on that feature) was to use Thread::Pool.

In this case, I initialize a pool of $numCPUs threads:

my $pool = Thread::Pool->new( { workers => $numCPUs, do => \&workerSub, } );

Then I assign submit the hrefs (i.e. jobs) to the thread pool, which select from the 10 hrefs to supposedly minimize any time where the core is just sitting idle.:

foreach my $href (@hrefSubsets) { my $jobid = $pool->job( $href, $param1, $param2, $param3 ); push(@threadPool, $jobid); }

I call result_any() to get the results of whichever threads finish first:

for(1..$totalThreads) { my $results = $pool->result_any( \$jobid ); }

When done, I call shutdown the pool:

$pool->shutdown();

Unfortunately, I have not been able to achieve the type of performance I get by using the traditional threads::shared approach. I realize that in my example, there are only 10 jobs to process, but I've tested a subsample (100 jobs) of my actual data (2500 jobs) and it still doesn't perform up to par -- using a load-balanced version is still significantly faster than a pooled approach. Is the overhead of using Thread::Pool really that great?

Some additional info: Thread::Pool doesn't allow the passing of shared variables as parameters to the worker sub, but I can pass a string to the worker sub that will identify the components of the globally shared variable (hash) that needs to be processed by the thread, so unless I'm missing some subtlety, I'm not creating copies of the (large) shared variable(s).

I'd appreciate any insight anyone may have into this issue.

Thank you!

Replies are listed 'Best First'.
Re: Efficiency of threads::shared versus Thread::Pool
by BrowserUk (Patriarch) on Aug 22, 2011 at 23:27 UTC

    My opinion is that no one should use Threads::Pool -- it is rubbish.

    Rather than hard-wiring the allocation of the hashes to your threads, make the array holding the hash references shared (if it isn't already). Then create a Thread::Queue. Push the indices (0..9 or 0..2499 etc.) onto that Queue. Have your threads read the next index off the Queue and process the associated hash.

    If one thread gets a particularly large one to process, the other threads will continue to read the next index from the queue as soon as they are free, and so the whole thing becomes self-balancing.

    As the hashrefs are in an array, you could avoid the queue by placing the highest index in a shared scalar and have each thread: lock the scalar; read its value; decrement it; unlock it; process the hashref indexed by the value read.

    But honestly, the Queue mechanism incurs very little overhead and works for any kind of ticketing mechanism, so it probably easier to stick with it.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Efficiency of threads::shared versus Thread::Pool
by zentara (Cardinal) on Aug 23, 2011 at 11:46 UTC
    and they process away. I call: $_->join() foreach (@threads); ...... The problem with this approach, I've found, is that if the complexity of the hrefs, say, in Core 1, is very low that Core 1 completes its jobs much sooner than the other 2 cores, I have a core waiting around doing nothing until the other two are done.

    Just something to try to free up cores, maybe detach your threads, instead of having them all wait for a join. When a detached thread reaches the end of it's code block, it should destroy itself, and I would think, free up your core.

    The one big drawback to detaching is once a thread is detached, it may not be joined, and any return data that it might have produced (if it was done and waiting for a join) is lost.


    I'm not really a human, but I play one on earth.
    Old Perl Programmer Haiku ................... flash japh
Re: Efficiency of threads::shared versus Thread::Pool
by locked_user sundialsvc4 (Abbot) on Aug 23, 2011 at 14:22 UTC

    Agreed.   Just do it like they do it in any restaurant.   Work-to-do goes into a (thread safe) queue, and whoever happens to be free grabs another unit of work off the queue and starts to prepare it.   Any thread that does not have anything to do goes to sleep, waiting for something to eventually show up in the queue.

    Allocate a “reasonable, and configurable,” number of threads or processes, consistent with your knowledge of the number of cores and of the system’s pragmatic I/O capacity.   It now becomes the operating system’s familiar task to keep the various cores and disk-drives busy.   Thanks to the “throttles” and the “governors” that you have designed into your software contraption, the OS is presented with a requirement that can actually be achieved and sustained.