comment on

To traverse a directory tree and do stuff with some or all of the data files therein, this method works very fast, takes up very little memory, and is a relatively easy framework for handling lots of jobs of this ilk. It involves using the standard unix "find" utility (which has been ported for ms-windows users, of course).


# assume you have a $toppath, which is where the traversal starts

chdir $toppath or die "can't cd to $toppath: $!";

open( FIND, "find . -type d -print0 |" ) or die "can't run find: $!";

# find will traverse downward from current directory
# (ie. $toppath), and because of the "-type d" option,
# will only list the paths of directories contained here;
# the "-print0" (thanks, etcshadow) sets a null byte as the
# string terminator for each file name (don't rely on "\n",
# which could be part of a file name).

{
  local $/ = "\x0";  # added thanks to etcshadow's reply
  while ( my $dir = <FIND> ) {
    chomp $dir;
    unless ( opendir( DIR, $dir )) {
        warn "$toppath/$dir: opendir failed: $!\n";
        next;
    }
    while ( my $file = readdir( DIR )) {
        next if ( -d "$dir/$file" ); # outer while loop will handle al
+l dirs
        # do what needs to be done with data files
    }
    closedir DIR;
    # anything else we need to do regarding this directory
  }
}
close FIND;
[download]

Comments:

The nice thing about this approach is that the "find" utility is very good with the recursive descent into subdirectories, and that's all it needs to do. Meanwhile, perl is very good with reading directory contents and manipulating data files, and it's really easy to do this when you're just working with data files in one directory at a time. Here, Perl can just skip over any subdirectories that it sees, because the output from "find" will bring those up for treatment in due course.

(update: made minor adjustments to comments in the code, added "closedir"; also wanted to point out that the loop over files could be moderated by using "grep ... readdir(DIR)", etc.)

In reply to An alternative to File::Find by graff

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.