comment on

Hi all,

I'm hoping someone has experience with threads::shared and Thread::Pool. I have a large amount of data that I "load balance" across the available $numCPUs (user-defined parameter) so that if I have:

(i) 10 hrefs in an array; and

(ii) 3 cores;

Core 1 gets: 3 hrefs

Core 2 gets: 3 hrefs

Core 3 gets: 4 hrefs

and they process away. I call:

$_->join() foreach (@threads);

to get the results. The problem with this approach, I've found, is that if the complexity of the hrefs, say, in Core 1, is very low that Core 1 completes its jobs much sooner than the other 2 cores, I have a core waiting around doing nothing until the other two are done. A potential solution to this (besides attempting to estimate the complexity of the hrefs and load balancing on that feature) was to use Thread::Pool.

In this case, I initialize a pool of $numCPUs threads:

my $pool = Thread::Pool->new(
    {
        workers => $numCPUs,
        do => \&workerSub,
    }
);
[download]

Then I assign submit the hrefs (i.e. jobs) to the thread pool, which select from the 10 hrefs to supposedly minimize any time where the core is just sitting idle.:

foreach my $href (@hrefSubsets) {
    my $jobid = $pool->job( $href, $param1, $param2, $param3 );
    push(@threadPool, $jobid);
}
[download]

I call result_any() to get the results of whichever threads finish first:

for(1..$totalThreads) {
    my $results = $pool->result_any( \$jobid );    
}
[download]

When done, I call shutdown the pool:

$pool->shutdown();

Unfortunately, I have not been able to achieve the type of performance I get by using the traditional threads::shared approach. I realize that in my example, there are only 10 jobs to process, but I've tested a subsample (100 jobs) of my actual data (2500 jobs) and it still doesn't perform up to par -- using a load-balanced version is still significantly faster than a pooled approach. Is the overhead of using Thread::Pool really that great?

Some additional info: Thread::Pool doesn't allow the passing of shared variables as parameters to the worker sub, but I can pass a string to the worker sub that will identify the components of the globally shared variable (hash) that needs to be processed by the thread, so unless I'm missing some subtlety, I'm not creating copies of the (large) shared variable(s).

I'd appreciate any insight anyone may have into this issue.

Thank you!

In reply to Efficiency of threads::shared versus Thread::Pool by traceyfreitas

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.