in reply to Re: Parallel::ForkManager for any array
in thread Parallel::ForkManager for any array

Thanks 1nickt for the valuable input !

removetree definitely new to me!

Now the code run and delete the main directory containing 8 files and 2 subdirectories,

the 8 children return fast enough since they are light weight files, but the other two children which were assigned to delete the 2 directories are still running,

I wonder if it is possible to let the return children to help out or create more children during the process? to achieve shorter deleting process?

  • Comment on Re^2: Parallel::ForkManager for any array

Replies are listed 'Best First'.
Re^3: Parallel::ForkManager for any array
by 1nickt (Canon) on Oct 17, 2018 at 13:08 UTC

    Possibly you could improve performance by getting a list of files to be deleted with their full path name, and then allowing the workers to loop through that list, rather than handing them a top-level "file" that may be a directory. That would even out the workload among workers.

    If this is really a problem that can benefit from parallelization, i.e. there are really a lot of files, and the task is not I/O bound (as I suspect), I would consider also using a technique that employs chunking, so each worker is given a block of files to process before pulling the next one, such as is provided by default by the excellent parallelization engine MCE. The following is untested and lacks error checking, debug output, etc., but should give you some ideas:

    use strict; use warnings; use Path::Iterator::Rule; use MCE; my $rule = Path::Iterator::Rule->new; # Add constraints to the rule here my $root = '/some/path'; my $iter = $rule->iter($root, { depthfirst => 1 }); my @list; while ( my $file = $iter->() ) { push @list, $file; } my $chunk_size = 100; # whatever makes sense for you MCE->new( user_func => \&task, max_workers => 10, chunk_size => $chunk +_size ); MCE->process( \@list ); MCE->shutdown; exit 0; sub task { my $mce = shift; # not used in this case my $file = shift; unlink($file) if -f $file; rmdir($file) if -d $file; } __END__

    Hope this helps!


    The way forward always starts with a minimal test.
      Thank you 1nickt for the help!

      I'll go through it tmr.

      Earlier today, I modified the code, to instantiate new Parallel::ForkManager objects three times in the script.

      then have them run serially, forks for 1st subdirectories, forks for 2nd subdirectories, last group of forks for the main directory

      At the point, I still thinking it's wasn't optimize, as the 1st and 2nd subdirectories should be run in parallel as well.

      but i havent quite figure out how to get there