SkullOne has asked for the wisdom of the Perl Monks concerning the following question:

Brothers,
I am continuing some of my file maintenance scripts, and noticed that either Windows or Perl has some severe speed issues while working with tens of thousands of files in a directory.
It takes 30 seconds, to 5 minutes to read in the list of files depending on the size, 10k files, or 400k files roughly. I know Windows, and especially NTFS does not perform well with that number, but I was looking for a faster directory reading function.
It appears VBScript can do a "getnext" on files, without getting in the full list. Is there something like that available in Perl?
Its likely I am not reading the directory correctly, but I can't find the difference between opendir and readdir, and just using chdir and doing a foreach loop.
See my code below:
use strict; use POSIX; use File::Copy; use File::Basename; use File::stat; my $src; my $dst; print "\nEnter Source Directory: "; chomp($src = <STDIN>); $dst = $src; $dst =~ s/inbox/queue/g; print "\nDESTINATION DIRECTORY SET TO: $dst\n"; chdir $src || die "Can't chdir to !\n" ; my $count; my $limit; print "\nEnter File Limit: "; chomp($limit = <STDIN>) $count = 1; for my $file (<*.imap>) { $count = $count + 1; my ($name,$path,$suffix) = fileparse($file,"\.imap") ; my $info = stat($file); my $datestamp = strftime("%Y%m%d", localtime($info->mtime)); mkdir "$dst\\$datestamp" or "Error making Directory $!\n"; print "\n Moving \"$file\" >> $dst\\$datestamp"; move $file,"$dst\\$datestamp\\$name$suffix" or warn "Cannot copy $ +file $!\n"; if($count > $limit) { print "\n \nFile Limit Reached. Stopping\n"; exit;} }

Replies are listed 'Best First'.
Re: Fast file and directory operations
by mr_mischief (Monsignor) on Mar 11, 2008 at 20:04 UTC
    You might check into File::Find. I'm not sure it'll be faster than using foreach to go through a file glob, but I suspect it may.
Re: Fast file and directory operations
by kyle (Abbot) on Mar 11, 2008 at 20:17 UTC
    for  my $file (<*.imap>)

    I believe this is going to build a list of every file you want to operate on, and then iterate over it. If that list is very long, you're going to spend a lot of memory storing it. The code below does the same thing, but it reads one directory entry at a time.

    opendir my $cwd_dh, '.' or die "Can't opendir: $!"; while ( my $file = readdir $cwd_dh ) { next unless $file =~ m{ \. imap \z }xms; # ... } closedir $cwd_dh;

    I would not expect this to be faster unless the memory used by the one big glob is sending you into swap.

    Have a look at opendir and readdir for details about this method.

Re: Fast file and directory operations
by halfcountplus (Hermit) on Mar 11, 2008 at 20:30 UTC
    "mr_mischief" beat me to the punch whilst typing, but File::Find works to return full pathnames of files in a directory without the need for chdir, readdir, or opendir. Then you can stat, move etc., using those full paths. Of course, you already have the pathname! so i don't understand why you need to chdir anyway (eg. won't "for my $file (<$src/*.imap>) work?), but here's a File::Find demo to consider:
    #!/usr/bin/perl use strict; use File::Find; my @dirs=("/home"); my @files=(); sub dirscan { if ($_ =~ /.imap$/) {push @files,$File::Find::name} } find(\&dirscan, @dirs); foreach (@files) {print "$_\n"}
    Positive thinking leads me to believe it will be faster.
      HOWEVER File::Find is manditorily recursive -- or at least, if anyone knows how to change that i'd be happy to know.

      Try counting slashes and prune:

      #!/usr/bin/perl # linux only use warnings; use strict; use File::Find; use File::Spec; if (@ARGV < 2){print "Usage: $0 dir depth\n";exit} my ($path, $depth)= @ARGV; my $abs_path = File::Spec->rel2abs($path); #in case you enter . for di +r my $m = ($abs_path) =~ tr!/!!; #count slashes in top path find (\&found,$abs_path); exit; sub found{ my $n = ($File::Find::name) =~ tr!/!!; #count slashes in file return $File::Find::prune = 1 if $n > ($m + $depth); # do stuff here. #print "$_\n"; #name only print "$File::Find::name\n"; #name with full path }

      or use File::Find::Rule

      #!/usr/bin/perl use warnings; use strict; use File::Find::Rule; my $word = shift || 'perl'; # find all the files of a given directory my $directory = shift || '/home/zentara'; my $depth = shift || 3; my $rule1 = File::Find::Rule->new; $rule1->maxdepth($depth); $rule1->file; $rule1->grep(qr/\Q$word\E/); #$rule->name( '*.pm' ); my @files = $rule1->in($directory); print "@files\n";

      I'm not really a human, but I play one on earth. Cogito ergo sum a bum
Re: Fast file and directory operations
by Roy Johnson (Monsignor) on Mar 11, 2008 at 19:55 UTC
    I can't find the difference between opendir and readdir, and just using chdir and doing a foreach loop
    Try using a while loop with readdir.

    Update: I may have misread. I thought you were saying you used a foreach with readdir. See opendir slower than ls on large dirs?.


    Caution: Contents may have been coded under pressure.