traceyfreitas has asked for the wisdom of the Perl Monks concerning the following question:
I'm hoping someone has experience with threads::shared and Thread::Pool. I have a large amount of data that I "load balance" across the available $numCPUs (user-defined parameter) so that if I have:
(i) 10 hrefs in an array; and
(ii) 3 cores;
Core 1 gets: 3 hrefs Core 2 gets: 3 hrefs Core 3 gets: 4 hrefsand they process away. I call:
$_->join() foreach (@threads);to get the results. The problem with this approach, I've found, is that if the complexity of the hrefs, say, in Core 1, is very low that Core 1 completes its jobs much sooner than the other 2 cores, I have a core waiting around doing nothing until the other two are done. A potential solution to this (besides attempting to estimate the complexity of the hrefs and load balancing on that feature) was to use Thread::Pool.
In this case, I initialize a pool of $numCPUs threads:
my $pool = Thread::Pool->new( { workers => $numCPUs, do => \&workerSub, } );
Then I assign submit the hrefs (i.e. jobs) to the thread pool, which select from the 10 hrefs to supposedly minimize any time where the core is just sitting idle.:
foreach my $href (@hrefSubsets) { my $jobid = $pool->job( $href, $param1, $param2, $param3 ); push(@threadPool, $jobid); }
I call result_any() to get the results of whichever threads finish first:
for(1..$totalThreads) { my $results = $pool->result_any( \$jobid ); }
When done, I call shutdown the pool:
$pool->shutdown();Unfortunately, I have not been able to achieve the type of performance I get by using the traditional threads::shared approach. I realize that in my example, there are only 10 jobs to process, but I've tested a subsample (100 jobs) of my actual data (2500 jobs) and it still doesn't perform up to par -- using a load-balanced version is still significantly faster than a pooled approach. Is the overhead of using Thread::Pool really that great?
Some additional info: Thread::Pool doesn't allow the passing of shared variables as parameters to the worker sub, but I can pass a string to the worker sub that will identify the components of the globally shared variable (hash) that needs to be processed by the thread, so unless I'm missing some subtlety, I'm not creating copies of the (large) shared variable(s).
I'd appreciate any insight anyone may have into this issue.
Thank you!
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Efficiency of threads::shared versus Thread::Pool
by BrowserUk (Patriarch) on Aug 22, 2011 at 23:27 UTC | |
|
Re: Efficiency of threads::shared versus Thread::Pool
by zentara (Cardinal) on Aug 23, 2011 at 11:46 UTC | |
|
Re: Efficiency of threads::shared versus Thread::Pool
by locked_user sundialsvc4 (Abbot) on Aug 23, 2011 at 14:22 UTC |