in reply to Re^2: Parallel::ForkManager for any array
in thread Parallel::ForkManager for any array

Possibly you could improve performance by getting a list of files to be deleted with their full path name, and then allowing the workers to loop through that list, rather than handing them a top-level "file" that may be a directory. That would even out the workload among workers.

If this is really a problem that can benefit from parallelization, i.e. there are really a lot of files, and the task is not I/O bound (as I suspect), I would consider also using a technique that employs chunking, so each worker is given a block of files to process before pulling the next one, such as is provided by default by the excellent parallelization engine MCE. The following is untested and lacks error checking, debug output, etc., but should give you some ideas:

use strict; use warnings; use Path::Iterator::Rule; use MCE; my $rule = Path::Iterator::Rule->new; # Add constraints to the rule here my $root = '/some/path'; my $iter = $rule->iter($root, { depthfirst => 1 }); my @list; while ( my $file = $iter->() ) { push @list, $file; } my $chunk_size = 100; # whatever makes sense for you MCE->new( user_func => \&task, max_workers => 10, chunk_size => $chunk +_size ); MCE->process( \@list ); MCE->shutdown; exit 0; sub task { my $mce = shift; # not used in this case my $file = shift; unlink($file) if -f $file; rmdir($file) if -d $file; } __END__

Hope this helps!


The way forward always starts with a minimal test.

Replies are listed 'Best First'.
Re^4: Parallel::ForkManager for any array
by MissPerl (Sexton) on Oct 17, 2018 at 15:11 UTC
    Thank you 1nickt for the help!

    I'll go through it tmr.

    Earlier today, I modified the code, to instantiate new Parallel::ForkManager objects three times in the script.

    then have them run serially, forks for 1st subdirectories, forks for 2nd subdirectories, last group of forks for the main directory

    At the point, I still thinking it's wasn't optimize, as the 1st and 2nd subdirectories should be run in parallel as well.

    but i havent quite figure out how to get there