Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re: Perl threads - parallel run not working

by marioroy (Prior)
on Sep 16, 2015 at 08:50 UTC ( [id://1142174]=note: print w/replies, xml ) Need Help??


in reply to Perl threads - parallel run not working

Some time ago, Yary and I exchanged emails for walking directories in parallel. The following is our collaborative effort modified for the OP task.

use strict; use warnings; # usage: ./collect_files.pl <user> /var/tmp /tmp # output is saved under /var/tmp/test_<user>.file.list package Number; sub new { my ($class, $self) = (shift, shift || 0); bless \$self, $class; } sub decr { lock $_[0]; --${ $_[0] } } sub incr { lock $_[0]; ++${ $_[0] } } sub get { lock $_[0]; ${ $_[0] } } package Main; use threads; use threads::shared; use Thread::Queue; use File::Spec::Functions qw(catfile); use MCE; # Get user info and directories to start with my $user = shift; my $uidNum = getpwnam($user); my @start_dirs = @ARGV or die "Please specify one or more starting directories, stopped"; -d or die "No such directory: $_, stopped" for @start_dirs; # Shared queue and counter my $shared_work = Thread::Queue->new( @start_dirs ); my $free_count = shared_clone( Number->new(0) ); # Open output file open my $user_file, ">", "/var/tmp/test_$user.file.list" or die "cannot open file: $!"; sub traverse { $free_count->decr; my ( $dev, $ino, $mode, $nlink, $uid, $gid ); my @work = $_; while ( $_ = shift @work, defined ) { my ( $dir, $path, @paths ) = ( $_ ); opendir DH, $dir or next; for ( readdir DH ) { next if $_ eq '.' || $_ eq '..'; if ( -d ( $path = catfile($dir,$_) ) ) { ( @work < 15 ) ? push @work, $path : $shared_work->enqueue($path); next; } push @paths, $path; } for my $file ( @paths ) { if ( -f $file ) { ( $dev, $ino, $mode, $nlink, $uid, $gid ) = lstat( $file ) +; MCE->print( $user_file, "$file:$uidNum:$gid:$mode\n" ) if $uid == $uidNum; } } } # Done with our work, let everyone know we're free $shared_work->enqueue( (undef) x MCE->max_workers ) if $free_count->incr == MCE->max_workers && !$shared_work->pendi +ng; } my $mce = MCE->new( max_workers => 8, user_begin => sub { $free_count->incr }, user_func => sub { traverse() while ($_ = $shared_work->dequeue, de +fined) }, ); $mce->run; close $user_file;

Kind regards, Mario

Replies are listed 'Best First'.
Re^2: Perl threads - parallel run not working
by marioroy (Prior) on Sep 16, 2015 at 08:57 UTC

    Updated September 5, 2016: Modified code to run with MCE 1.8 / MCE::Shared 1.8.

    The following runs with Perl not built with threads support.

    use strict; use warnings; # usage: ./collect_files.pl <user> /var/tmp /tmp # output is saved under /var/tmp/test_<user>.file.list use File::Spec::Functions qw(catfile); use MCE; use MCE::Queue; use MCE::Shared; # Get user info and directories to start with my $user = shift; my $uidNum = getpwnam($user); my @start_dirs = @ARGV or die "Please specify one or more starting directories, stopped"; -d or die "No such directory: $_, stopped" for @start_dirs; # Shared queue and counter my $shared_work = MCE::Queue->new( fast => 1, queue => \@start_dirs ); my $free_count = MCE::Shared->scalar( 0 ); # Open output file open my $user_file, ">", "/var/tmp/test_$user.file.list" or die "cannot open file: $!"; sub traverse { $free_count->decr; my ( $dev, $ino, $mode, $nlink, $uid, $gid ); my @work = $_; while ( $_ = shift @work, defined ) { my ( $dir, $path, @paths ) = ( $_ ); opendir DH, $dir or next; for ( readdir DH ) { next if $_ eq '.' || $_ eq '..'; if ( -d ( $path = catfile($dir,$_) ) ) { ( @work < 15 ) ? push @work, $path : $shared_work->enqueue($path); next; } push @paths, $path; } for my $file ( @paths ) { if ( -f $file ) { ( $dev, $ino, $mode, $nlink, $uid, $gid ) = lstat( $file ) +; MCE->print( $user_file, "$file:$uidNum:$gid:$mode\n" ) if $uid == $uidNum; } } } # Done with our work, let everyone know we're free $shared_work->enqueue( (undef) x MCE->max_workers ) if $free_count->incr == MCE->max_workers && !$shared_work->pendi +ng; } my $mce = MCE->new( max_workers => 8, user_begin => sub { $free_count->incr }, user_func => sub { traverse() while ($_ = $shared_work->dequeue, de +fined) }, ); $mce->run; close $user_file;

    Kind regards, Mario

Re^2: Perl threads - parallel run not working
by Yary (Pilgrim) on Sep 16, 2015 at 15:21 UTC
    Processing large directory trees in parallel is an interesting problem, as "a directory with huge number of files" could be split up differently than "a huge number of branches" depending on one's approach. The method above should work with either (and is even more general than Gnu "parallel").

    File::Find can be made to work in parallel, but it is ugly. Cleaner to start new as we did here. I keep meaning to blog about this alternate solution (have post half-written) and turn it into a CPAN module...

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1142174]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (4)
As of 2024-04-25 12:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found