Fast file and directory operations

SkullOne has asked for the wisdom of the Perl Monks concerning the following question:

Brothers,
I am continuing some of my file maintenance scripts, and noticed that either Windows or Perl has some severe speed issues while working with tens of thousands of files in a directory.
It takes 30 seconds, to 5 minutes to read in the list of files depending on the size, 10k files, or 400k files roughly. I know Windows, and especially NTFS does not perform well with that number, but I was looking for a faster directory reading function.
It appears VBScript can do a "getnext" on files, without getting in the full list. Is there something like that available in Perl?
Its likely I am not reading the directory correctly, but I can't find the difference between opendir and readdir, and just using chdir and doing a foreach loop.
See my code below:

use strict;
use POSIX;
use File::Copy;
use File::Basename;
use File::stat;

my $src;
my $dst;

print "\nEnter Source Directory: ";
chomp($src = <STDIN>);

$dst = $src;
$dst =~ s/inbox/queue/g;
print "\nDESTINATION DIRECTORY SET TO: $dst\n";

chdir $src || die "Can't chdir to !\n" ;
my $count;
my $limit;
print "\nEnter File Limit: ";
chomp($limit = <STDIN>)

$count = 1;
for  my $file (<*.imap>)
  {
      $count = $count + 1;
      my ($name,$path,$suffix) = fileparse($file,"\.imap") ;
    my $info = stat($file);
    my $datestamp = strftime("%Y%m%d", localtime($info->mtime));
    mkdir "$dst\\$datestamp" or "Error making Directory $!\n";
    print "\n Moving \"$file\" >> $dst\\$datestamp";
    move $file,"$dst\\$datestamp\\$name$suffix" or warn "Cannot copy $
+file $!\n";
    if($count > $limit)
    {
        print "\n \nFile Limit Reached. Stopping\n";
        exit;}
  }
[download]

Comment on Fast file and directory operations Download Code

Replies are listed 'Best First'.
Re: Fast file and directory operations by mr_mischief (Monsignor) on Mar 11, 2008 at 20:04 UTC
You might check into File::Find. I'm not sure it'll be faster than using foreach to go through a file glob, but I suspect it may.	[reply]
Re: Fast file and directory operations by kyle (Abbot) on Mar 11, 2008 at 20:17 UTC
`for my $file (<*.imap>)` I believe this is going to build a list of every file you want to operate on, and then iterate over it. If that list is very long, you're going to spend a lot of memory storing it. The code below does the same thing, but it reads one directory entry at a time. `opendir my $cwd_dh, '.' or die "Can't opendir: $!"; while ( my $file = readdir $cwd_dh ) { next unless $file =~ m{ \. imap \z }xms; # ... } closedir $cwd_dh;` [download] I would not expect this to be faster unless the memory used by the one big glob is sending you into swap. Have a look at opendir and readdir for details about this method.	[reply] [d/l] [select]
Re: Fast file and directory operations by halfcountplus (Hermit) on Mar 11, 2008 at 20:30 UTC
"mr_mischief" beat me to the punch whilst typing, but File::Find works to return full pathnames of files in a directory without the need for chdir, readdir, or opendir. Then you can stat, move etc., using those full paths. Of course, you already have the pathname! so i don't understand why you need to chdir anyway (eg. won't "for my $file (<$src/*.imap>) work?), but here's a File::Find demo to consider: `#!/usr/bin/perl use strict; use File::Find; my @dirs=("/home"); my @files=(); sub dirscan { if ($_ =~ /.imap$/) {push @files,$File::Find::name} } find(\&dirscan, @dirs); foreach (@files) {print "$_\n"}` [download] Positive thinking leads me to believe it will be faster.	[reply] [d/l]
Re^2: Fast file and directory operations by zentara (Cardinal) on Mar 12, 2008 at 13:20 UTC
*HOWEVER File::Find is manditorily recursive -- or at least, if anyone knows how to change that i'd be happy to know.* Try counting slashes and prune: #!/usr/bin/perl # linux only use warnings; use strict; use File::Find; use File::Spec; if (@ARGV < 2){print "Usage: $0 dir depth\n";exit} my ($path, $depth)= @ARGV; my $abs_path = File::Spec->rel2abs($path); #in case you enter . for di +r my $m = ($abs_path) =~ tr!/!!; #count slashes in top path find (\&found,$abs_path); exit; sub found{ my $n = ($File::Find::name) =~ tr!/!!; #count slashes in file return $File::Find::prune = 1 if $n > ($m + $depth); # do stuff here. #print "$_\n"; #name only print "$File::Find::name\n"; #name with full path } [download] or use File::Find::Rule `#!/usr/bin/perl use warnings; use strict; use File::Find::Rule; my $word = shift \|\| 'perl'; # find all the files of a given directory my $directory = shift \|\| '/home/zentara'; my $depth = shift \|\| 3; my $rule1 = File::Find::Rule->new; $rule1->maxdepth($depth); $rule1->file; $rule1->grep(qr/\Q$word\E/); #$rule->name( '*.pm' ); my @files = $rule1->in($directory); print "@files\n";` [download] I'm not really a human, but I play one on earth. Cogito ergo sum a bum	[reply] [d/l] [select]
Re: Fast file and directory operations by Roy Johnson (Monsignor) on Mar 11, 2008 at 19:55 UTC
I can't find the difference between opendir and readdir, and just using chdir and doing a foreach loop Try using a `while` loop with `readdir`. Update: I may have misread. I thought you were saying you used a foreach with readdir. See opendir slower than ls on large dirs?. Caution: Contents may have been coded under pressure.	[reply] [d/l] [select]