elpis has asked for the wisdom of the Perl Monks concerning the following question:

I want to parallelize a code written in Perl. The code loops through multiple files and calls a subroutine for each file. I also need to share some readonly local data-structures with the subroutine.

sub process_in_parallel { my $readOnlySchema = foo(); foreach my $file (@files) { validate_the_file($file,$readOnlySchema); } }

I am pretty new to perl programming and hence need a lot of advice here. What are the perl modules that the perl monks can recommend for this scenario?

I tried some of the following:

- threads : The problem with this is managing the threads. Is there an efficient thread Manager or thread pool library that can help me with this? I am also not sure if I can share the readOnly object easily.

-Parallel::ForkManager : The problem with this is that it forks processes rather than threads and is increasing the time of execution in my case.

Can you please suggest other libraries also?

I have the same question posted here also : http://stackoverflow.com/questions/42391233/perl-modules-to-use-for-parallel-processing

Replies are listed 'Best First'.
Re: Perl modules that I can use for Multithreading
by Corion (Patriarch) on Feb 22, 2017 at 13:22 UTC

    Depending on your platform, fork() may be implemented via threads, for example on Windows. Here, using fork doesn't improve things.

    Have you checked that your process isn't bound by the time to read each file? If the time it takes to read each file from disk or from the network dominates your processing, then parallelizing will only improve the overall processing time if you are not yet at the upper IO limit.

    You could somewhat easily test this by manually launching your programs for three files (or how many CPUs you have idle). If that improves the processing time, then you have something to gain from a parallel approach.

    If you've determined that parallelizing gains you something, I would use the "worker pool" approach that you can find in most posts here by BrowserUk about threads:

    my @files = @ARGV; my $NUM_CPUS = 4; # or whatever my $jobs = Thread::Queue->new(@files); # We use undef as "end of jobs" marker $jobs->enqueue( undef ) for $NUM_CPUS; my @threads = map { threads->create(\&process); } 1..$NUM_CPUS; sub process { while( my $file = $jobs->dequeue ) { validate_the_file($file,$readOnlySchema); }; }; $_->join for @threads;

    You may or may not see an improvement by moving $readOnlySchema into &process. Data "shared" across threads can be problematic if you're using XS modules like XML::LibXML.

Re: Perl modules that I can use for Multithreading
by 1nickt (Canon) on Feb 22, 2017 at 13:23 UTC

    Hi elpis, welcome to the Monastery and to Perl, the One True Religion.

    See perhaps MCE::Shared.

    ( But if your execution time increased with Parallel::ForkManager, you may be doing work inside the forked processes that should be done before forking. )

    Hope this helps!


    The way forward always starts with a minimal test.
Re: Perl modules that I can use for Multithreading
by hippo (Archbishop) on Feb 22, 2017 at 13:47 UTC
    threads : The problem with this is managing the threads. Is there an efficient thread Manager or thread pool library that can help me with this?

    Personally I have not found "managing the threads" to be problematic. Since I don't know quite why you perceive there to be a problem there's not much I can suggest to solve it. What precisely is the problem as you see it?

    I am also not sure if I can share the readOnly object easily.
    use threads::shared; my $readOnlySchema :shared = foo();

    That seems pretty easy to me.

Re: Perl modules that I can use for Multithreading
by Borodin (Sexton) on Feb 22, 2017 at 13:46 UTC
    This question has been cross-posted to Stack Overflow. Humorously, the cross-post also contains the phrase "What are the perl modules that the perl monks can recommend ... ?"
Re: Perl modules that I can use for Multithreading
by BrowserUk (Patriarch) on Feb 22, 2017 at 14:50 UTC

    How many files and how long does it take to validate them serially?


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
    In the absence of evidence, opinion is indistinguishable from prejudice.