comment on

use Devel::Pointer;
...
    my $obj = deref ( $addr ) ;            
    my $root = shift ( @{$obj->{dirToFetch}} ) ;
[download]

Why are you doing that?

Perl is not C and you don't need to step outside the Perl datatypes to handle data access from multiple threads within Perl.

The following should be the equivalent of what you do, except far saner and not needing Devel::Pointer:

...
sub _walk {
    my $obj = shift;
    my $root = shift ( @{$obj->{dirToFetch}} ) ;
    $obj -> _fetchDir ( $root ) ;    
}
...
        threads->create ( '_walk' , $self ) -> join;
[download]

Note that you do not even start running multiple threads in the above because you spawn a separate thread but don't continue until it has finished its work. Most likely, a better approach is to store all threads and then wait for them to finish:

...
        push my @running, threads->create ( '_walk' , $self );
...
while( @running ) {
    my $next = shift @running;
    $next->join;
};
[download]

Personally, I recommend using Thread::Queue and a worker pool to handle a workload because starting a Perl thread is relatively resource intensive. I'm not sure that using multiple threads will bring you much benefit, as I think your operation largely is limited by the network or the HD (or filesystem) performance.

Thinking more about it, I guess that a somewhat better approach is to have all directories to crawl stored in a Thread::Queue and to have threads fetch from that whenever they need to crawl a new directory. For output, I would use another Thread::Queue, just for simplicissity (roughly adapted from here:

#! perl -slw
use strict;
use threads;
use Thread::Queue;

my $directories = Thread::Queue->new();
my $files = Thread::Queue->new();

use vars '$NUM_CPUS';
$NUM_CPUS ||= 4;

sub _walk {
    while( defined my $dir = $directories->dequeue) {;
        my @entries = ...;
        for my $e (@entries) {
            if( -d $e ) {
                # depth-first search
                $directories->insert(0, $e);
            } else {
                # It would be much faster to enqueue all files in bulk
+ instead
                # of enqueueing them one by one, but first get it work
+ing before
                # you make it fast
                $files->enqueue( $e );
            };
        };
    };
}

$directories->enqueue( @ARGV );

for ( 1..$NUM_CPUS ) {
    threads->new( \&_walk )->detach;
};

print while defined( $_ = $files->dequeue );

print 'Done';
[download]

In reply to Re: use threads for dir tree walking really hurts by Corion
in thread use threads for dir tree walking really hurts by exilepanda

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Pathologically Eclectic Rubbish Lister
	PerlMonks