Based on the detailed explanation you have given, I feel that multiprocessing using fork() would be more appropriate then threads. I thought of using threads due to only one reason - it would have avoided significant code change and still I would have benefited by parallel processing of sub tasks.
Hm. Nothing in the sparse details you've outlined gives me cause to reach that conclusion; especially if -- as you've suggested -- using fork would require a substantial re-write.
Let's say the basic structure of your current serial application is something like:
#! perl -slw use strict; use constant { TOTAL_JOBS => 130, }; for my $job ( 1 .. TOTAL_JOBS ) { open my $in, '<', 'bigdata.' . $job or die $!; my @localData = <$in>; close $in; ## do stuff with @localData }
Then converting that to concurrency using threads could be a simple as:
#! perl -slw use strict; use threads; use constant { TOTAL_JOBS => 130, MAX_CONCURRENT => 40, }; for my $job ( 1 .. TOTAL_JOBS ) { async { open my $in, '<', 'bigdata.' . $job or die $!; my @localData = <$in>; close $in; ## do stuff with @localData }; sleep 1 while threads->list( threads::running ) >= MAX_CONCURRENT; $_->join for threads->list( threads::joinable ); } sleep 1 while threads->list( threads::running ); $_->join for threads->list( threads::joinable );
But I see parallel processing giving around 40-50% (10 Hours) reduction in overall processing time, hence this question.
Given the capacity of the hardware you have available, I could well see the above reducing the runtime to less that 5% of the serial version; though the devil is in the details you have not provided.
Of course, using Parallel::ForkManager should allow a very similar time reduction, using a very similar minor modification of the existing code.
Why you feel that using fork should require a substantial re-write is far from obvious from the scant details you've provided. Ditto, the need for file-based counting and locking.
In reply to Re^3: ithreads or fork() what you recommend?
by BrowserUk
in thread ithreads or fork() what you recommend?
by pawan68923
For: | Use: | ||
& | & | ||
< | < | ||
> | > | ||
[ | [ | ||
] | ] |