Re: Perl modules that I can use for Multithreading

Depending on your platform, fork() may be implemented via threads, for example on Windows. Here, using fork doesn't improve things.

Have you checked that your process isn't bound by the time to read each file? If the time it takes to read each file from disk or from the network dominates your processing, then parallelizing will only improve the overall processing time if you are not yet at the upper IO limit.

You could somewhat easily test this by manually launching your programs for three files (or how many CPUs you have idle). If that improves the processing time, then you have something to gain from a parallel approach.

If you've determined that parallelizing gains you something, I would use the "worker pool" approach that you can find in most posts here by BrowserUk about threads:

my @files = @ARGV;
my $NUM_CPUS = 4; # or whatever
my $jobs = Thread::Queue->new(@files);

# We use undef as "end of jobs" marker
$jobs->enqueue( undef ) for $NUM_CPUS;

my @threads = map {
    threads->create(\&process);
} 1..$NUM_CPUS;

sub process {
    while( my $file = $jobs->dequeue ) {
        validate_the_file($file,$readOnlySchema);
    };
};

$_->join for @threads;
[download]

You may or may not see an improvement by moving $readOnlySchema into &process. Data "shared" across threads can be problematic if you're using XS modules like XML::LibXML.

Comment on Re: Perl modules that I can use for Multithreading Select or Download Code