comment on

Using File::Find will likely be faster simply because File::Find chdir()s into each directory as it recurses so that you are doing things like stat("file.txt") instead of stat("root/subdir/subsubdir/file.txt") which has to at least parse that path every time and probably traverse each of the directories mentioned each time.

Another way to make your code faster is to use the special stat target of _ which lets you get more data about the same file without making Perl call stat over and over.

The trick with File::Find is how to share the variables between related calls to your "wanted" subroutine while not sharing them between unrelated calls to your "wanted" subroutine.

You could do something very similar to what you have above with:

find(
    sub {
        filestat( "ignored",
            $file_count, $dir_count, $total_size,
            $aged_file_count, $aged_total_size );
    },
    $pathname
);
[download]

and then rip out most of your "pathstat" and rename it "filestat":

sub filestat {
      if (-d $_) {
         if ($_ ne "." && $_ ne "..") {
            ++$_[2];
         }
      } else {
         ++$_[1];
         $_[3] += -s _;
         my $file_age= (-C _);
         if ($file_age >= $lowrange && $file_age <= $highrange) {
            ++$_[4];
            $_[5] += -s _;
         }
      }
}
[download]

but it is possible to clean that up much more.

If going for maximal speed, I'd probably make that code a bit easier to read and maintain by using symbolic constants instead of literal 1 through 5:

sub iFileCount() { 0; }
sub iDirCount() { 1; }
sub iTotalSize() { 2; }
sub iAgedFileCount() { 3; }
sub iAgedTotalSize() { 4; }

find(
    sub {
        filestat(
            $file_count, $dir_count, $total_size,
            $aged_file_count, $aged_total_size );
    },
    $pathname
);

sub filestat {
   my($file_age);
   my($file_size);
      if (-d $_) {
         if ($_ ne "." && $_ ne "..") {
            ++$_[iDirCount];
         }
      } else {
         ++$_[iFileCount];
         $file_size = (-s _);
         $_[iTotalSize] += $file_size;
         $file_age = (-C _);
         if ($file_age >= $lowrange && $file_age <= $highrange) {
            ++$_[iAgedFileCount];
            $_[iAgedTotalSize] += $file_size;
         }
      }
}
[download]

You could also consider using File::Recurse which has some niceties over File::Find [ but maybe isn't being maintained anymore? ): ].

You could probably make your own code faster even than File::Find code by reworking it to use chdir (and the "-x _" trick) since File::Find will often have to stat a file but your "wanted" routine can't tell when File::Find has already stated it so you have to stat each file and you end up with you and File::Find both stating the files much of the time.

- tye (but my friends call me "Tye")

In reply to (tye)Re: faster filesystem stats by tye
in thread faster filesystem stats by Anonymous Monk

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.