in reply to Function to sweep a file tree
If speed is important, prefer readdir over glob (3d run results):
use strict; use warnings; use feature 'say'; use Data::Dump 'dd'; use Time::HiRes 'time'; use File::Glob ':bsd_glob'; use Fcntl ':mode'; use File::stat; use Win32::LongPath; STDOUT-> autoflush; my $dir = 'c:/program files'; { print "testing glob... "; my $t = time; my @to_do = ( $dir ); my @result; while ( my $item = shift @to_do ) { my $stat = stat $item; next unless $stat; my $mode = $stat-> mode; if ( $mode & S_IFREG ) { push @result, [ $item, $stat-> size ]; } elsif ( $mode & S_IFDIR ) { unshift @to_do, grep { !m{ /\.{1,2}$ }x } bsd_glob( "$item/{.,}*" ) } } printf "%d files, %.03f s\n", scalar( @result ), time - $t; } { print "testing readdir... "; my $t = time; my @to_do = ( $dir ); my @result; while ( my $item = shift @to_do ) { my $stat = statL $item; next unless $stat; if ( $stat-> { mode } & S_IFREG ) { push @result, [ $item, $stat-> { size }]; } elsif ( $stat-> { mode } & S_IFDIR ) { my $d = Win32::LongPath-> new; $d-> opendirL( $item ) or next; unshift @to_do, map { "$item/$_" } grep { !m{ ^\.{1,2}$ }x } $d-> readdirL; } } printf "%d files, %.03f s\n", scalar( @result ), time - $t; } __END__ testing glob... 18670 files, 3.492 s testing readdir... 18670 files, 1.863 s
Sorry I've re-written your code completely, it was for investigation only. (One minor complaint may be that grep {} glob(), glob() looks like (grep {} glob()), glob() was intended, but this complaint is irrelevant to results). Also irrelevant (to speed) and maybe distracting are details which have happened in final script (which is not too DRY to begin with): bsd_glob, File::stat, no file tests as such, and, also, use of Win32::LongPath itself. The latter is slightly slower than opendir/readdir, and if trees are grown in controlled environment, not really necessary.
I suspect the explanation is glob performs stat on produced items (as File::Find does, if I'm not mistaken), it can't be so much slower because of strings manipulation only. BTW, I observe similar difference on Linux.
There's a cheat in that same number of files was neatly reported above -- but lists may not be the same, your result may have differing numbers, I get differing numbers for e.g. c:\users. I didn't investigate if it's access rights issues, or some specially treated magic directories on Windows, or links, etc. By that time I already discarded all error logging :). Maybe not important if trees are grown in data files land.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Function to sweep a file tree
by bojinlund (Monsignor) on Jun 21, 2020 at 06:21 UTC | |
by vr (Curate) on Jun 21, 2020 at 11:42 UTC | |
by bojinlund (Monsignor) on Jun 22, 2020 at 05:56 UTC | |
by vr (Curate) on Jun 22, 2020 at 11:03 UTC | |
by bojinlund (Monsignor) on Jun 22, 2020 at 11:53 UTC | |
by Anonymous Monk on Jun 22, 2020 at 08:20 UTC |