If speed is important, prefer readdir over glob (3d run results):

use strict; use warnings; use feature 'say'; use Data::Dump 'dd'; use Time::HiRes 'time'; use File::Glob ':bsd_glob'; use Fcntl ':mode'; use File::stat; use Win32::LongPath; STDOUT-> autoflush; my $dir = 'c:/program files'; { print "testing glob... "; my $t = time; my @to_do = ( $dir ); my @result; while ( my $item = shift @to_do ) { my $stat = stat $item; next unless $stat; my $mode = $stat-> mode; if ( $mode & S_IFREG ) { push @result, [ $item, $stat-> size ]; } elsif ( $mode & S_IFDIR ) { unshift @to_do, grep { !m{ /\.{1,2}$ }x } bsd_glob( "$item/{.,}*" ) } } printf "%d files, %.03f s\n", scalar( @result ), time - $t; } { print "testing readdir... "; my $t = time; my @to_do = ( $dir ); my @result; while ( my $item = shift @to_do ) { my $stat = statL $item; next unless $stat; if ( $stat-> { mode } & S_IFREG ) { push @result, [ $item, $stat-> { size }]; } elsif ( $stat-> { mode } & S_IFDIR ) { my $d = Win32::LongPath-> new; $d-> opendirL( $item ) or next; unshift @to_do, map { "$item/$_" } grep { !m{ ^\.{1,2}$ }x } $d-> readdirL; } } printf "%d files, %.03f s\n", scalar( @result ), time - $t; } __END__ testing glob... 18670 files, 3.492 s testing readdir... 18670 files, 1.863 s

Sorry I've re-written your code completely, it was for investigation only. (One minor complaint may be that grep {} glob(), glob() looks like (grep {} glob()), glob() was intended, but this complaint is irrelevant to results). Also irrelevant (to speed) and maybe distracting are details which have happened in final script (which is not too DRY to begin with): bsd_glob, File::stat, no file tests as such, and, also, use of Win32::LongPath itself. The latter is slightly slower than opendir/readdir, and if trees are grown in controlled environment, not really necessary.

I suspect the explanation is glob performs stat on produced items (as File::Find does, if I'm not mistaken), it can't be so much slower because of strings manipulation only. BTW, I observe similar difference on Linux.

There's a cheat in that same number of files was neatly reported above -- but lists may not be the same, your result may have differing numbers, I get differing numbers for e.g. c:\users. I didn't investigate if it's access rights issues, or some specially treated magic directories on Windows, or links, etc. By that time I already discarded all error logging :). Maybe not important if trees are grown in data files land.


In reply to Re: Function to sweep a file tree by vr
in thread Function to sweep a file tree by bojinlund

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.