comment on

use Devel::Pointer;
...
    my $obj = deref ( $addr ) ;            
    my $root = shift ( @{$obj->{dirToFetch}} ) ;
[download]

Why are you doing that?

Perl is not C and you don't need to step outside the Perl datatypes to handle data access from multiple threads within Perl.

The following should be the equivalent of what you do, except far saner and not needing Devel::Pointer:

...
sub _walk {
    my $obj = shift;
    my $root = shift ( @{$obj->{dirToFetch}} ) ;
    $obj -> _fetchDir ( $root ) ;    
}
...
        threads->create ( '_walk' , $self ) -> join;
[download]

Note that you do not even start running multiple threads in the above because you spawn a separate thread but don't continue until it has finished its work. Most likely, a better approach is to store all threads and then wait for them to finish:

...
        push my @running, threads->create ( '_walk' , $self );
...
while( @running ) {
    my $next = shift @running;
    $next->join;
};
[download]

Personally, I recommend using Thread::Queue and a worker pool to handle a workload because starting a Perl thread is relatively resource intensive. I'm not sure that using multiple threads will bring you much benefit, as I think your operation largely is limited by the network or the HD (or filesystem) performance.

Thinking more about it, I guess that a somewhat better approach is to have all directories to crawl stored in a Thread::Queue and to have threads fetch from that whenever they need to crawl a new directory. For output, I would use another Thread::Queue, just for simplicissity (roughly adapted from here:

#! perl -slw
use strict;
use threads;
use Thread::Queue;

my $directories = Thread::Queue->new();
my $files = Thread::Queue->new();

use vars '$NUM_CPUS';
$NUM_CPUS ||= 4;

sub _walk {
    while( defined my $dir = $directories->dequeue) {;
        my @entries = ...;
        for my $e (@entries) {
            if( -d $e ) {
                # depth-first search
                $directories->insert(0, $e);
            } else {
                # It would be much faster to enqueue all files in bulk
+ instead
                # of enqueueing them one by one, but first get it work
+ing before
                # you make it fast
                $files->enqueue( $e );
            };
        };
    };
}

$directories->enqueue( @ARGV );

for ( 1..$NUM_CPUS ) {
    threads->new( \&_walk )->detach;
};

print while defined( $_ = $files->dequeue );

print 'Done';
[download]

In reply to Re: use threads for dir tree walking really hurts by Corion
in thread use threads for dir tree walking really hurts by exilepanda

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.