recursive directory question

Aquilo has asked for the wisdom of the Perl Monks concerning the following question:

I *just* started using perl but I trying to learn. Explainations of what is wrong are more useful than simple fixing the problem for me.

I have seen many posts about proforming and function on every file in a directory but have seen none about working on the directory level.

I need a script that:
(UNIX only) Recurses through a directory structure and checks if more than half of the file in that directory have been used in the past 180 days. The path of directory which are predominately unused is apprended to a list of directory which will be used to archive them.

There area I'm most concerned with is my recurse functions itself. And if there is a more efficent way of doing this please explain that as well.


#!usr/bin/perl 

# Author: Ryan Scadlock    
# Date: 6 July 2002 

use strict;

my $path = './'; 
  die "The file $base_file does not exist!\n" if (!-f $base_file); 

# Initiate the recursion 
&RecurseDirs($path); 

rm temp;
print "The result can be found at Results file in this dir";

#### SUBROUTINES SECTION #### 

# Function that recurses through the directory tree 
sub RecurseDirs 
{ 
    my ($path) = @_; 
    my $file;    #Variable for a file 

    foreach $path($path){
    
       opendir (DIRECTORY, $path) || die "Can't read $path\n"; 
    
     if (-d "$path$file/") { #If it's a directory... 
        
        # Recurse again through this directory 
        &RecurseDirs("$path$file/"); 

        my $unused = 0;    #Counter for how many files are accessed
        my $count = 0;    #Counter for how many files are in dir
        
        # Count unused files in dir
        $path find -type f -atime +180 >temp.txt;
        $str = wc temp -l;
        $unused = int::substr($str, 0, 1); #returns the first characte
+r as an int
        
        # Count files in dir
        $path find -type f >temp.txt;
        $str = wc temp -l;
        $count = int::substr($str, 0, 1); #returns the first character
+ as an int
        
        # Compare 
        # Yes: write full dir path to file
        if (($unused * 2) > $count){
        $path >>result.txt;
        } 
    }
    closedir (DIRECTORY); 
    } 
}
[download]

Comment on recursive directory question Download Code

Replies are listed 'Best First'.
•Re: recursive directory question by merlyn (Sage) on Jul 08, 2002 at 04:38 UTC
`use File::Find; my %ages; # $ages{$dirname}{old}, $ages{$dirname}{new} find sub { return unless -f; $ages{$File::Find::dir}{-M _ > 180 ? 'old' : 'new'}++; }, "."; # put your topdirs here for (sort keys %ages) { if ($ages{$_}{old} > $ages{$_}{new}) { print "$_ has more old than new\n"; } }` [download] -- Randal L. Schwartz, Perl hacker update: Hey, that was actually useful! I found three directories I could archive! Neat! Thanks for the idea. I'll put it into a mini-snippets article for one of my upcoming column articles!	[reply] [d/l]
Re: *Re: recursive directory question by Fletch (Bishop) on Jul 08, 2002 at 11:34 UTC
Trivial improvement: stick `my $top = shift \|\| ".";` before the `find` and change the `"."` to `$top` and it's even more useful.	[reply] [d/l] [select]
Re^2: recursive directory question by flounder99 (Friar) on Jul 08, 2002 at 12:09 UTC
I have a question - What does the `-M _` do? I understand that the `-M` returns the modification time (since script start) but what is the `_` ? I see this in the `File::Find man` page but I don't have a good grip on what is going on. Does it mean `$_` ? If so why not use `$_` ? Anyway, nice solution. -- flounder	[reply] [d/l] [select]
•Re: Re^2: recursive directory question by merlyn (Sage) on Jul 08, 2002 at 12:47 UTC
The underscore filehandle (one of the few features of Perl to which I can claim to have introduced), means "don't actually perform a stat, but use the information cached from the most recent other stat". Well, the docs in perlfunc say it better: If any of the file tests (or either the "stat" or "lstat" operators) are given the special filehan- dle consisting of a solitary underline, then the stat structure of the previous file test (or stat operator) is used, saving a system call. (This doesn't work with "-t", and you need to remember that lstat() and "-l" will leave values in the stat structure for the symbolic link, not the real file.) (Also, if the stat buffer was filled by a "lstat" call, "-T" and "-B" will reset it with the results of "stat _"). Example: print "Can do.\n" if -r $a \|\| -w _ \|\| -x _; stat($filename); print "Readable\n" if -r _; print "Writable\n" if -w _; print "Executable\n" if -x _; print "Setuid\n" if -u _; print "Setgid\n" if -g _; print "Sticky\n" if -k _; print "Text\n" if -T _; print "Binary\n" if -B _; [download] -- Randal L. Schwartz, Perl hacker	[reply] [d/l]
Re^4: recursive directory question by flounder99 (Friar) on Jul 08, 2002 at 13:57 UTC
Re: recursive directory question by jarich (Curate) on Jul 08, 2002 at 08:17 UTC
G'day Aquilo In the interests of explaining what is wrong with your code I've written a point by point summary of various lines of your code. Like merlyn I like the problem, and I hope that my advice here may be of help. I must say though that merlyn's solution or any other solution that uses File::Find is guaranteed to be better than the improvements I suggest to your code. Read more... (10 kB) Of course, doing it all by hand is nowhere near as fast as getting that lovely module File::Find to do it for you, so use merlyn's solution in preference to this. Hope it helps. jarich Update: Added test for symbolic link, as per Aristotle's reminder. I'm sure there are more gotchas here.	[reply] [d/l] [select]
Re^2: recursive directory question by Aristotle (Chancellor) on Jul 08, 2002 at 13:25 UTC
Careful.. you do not test for symlinks. Do an `ln -s . snaretrap` and watch the code recurse forever. That is another (and more important) reason to rely on File::Find - it correctly treats all the gotchas that can come up when traversing directories. Makeshifts last the longest.	[reply]
Re: recursive directory question by Aquilo (Initiate) on Jul 08, 2002 at 16:31 UTC
First, thanks for all the suggestions and comments. Particularly merlyn for his File::Find solution and jarich for his excellent explanation of my code and it's errors This is now the most up to date version of the problem: #!/usr/bin/perl -w # Author: Ryan Scadlock with significant help # from Perlmonks merlyn, jarich, and Fletch # Date: 6 July 2002 # Description: (UNIX only) Recurses through a directory structure and # checks if more than half of the file in that directory have been use +d # in the past 180 days. The path of diectory which are predominately # unused is apprended to a list. use File::Find; # file's date of last access is determining factor my %access; #$access{$dirname}{old}, $access{$dirname}{new} my $top = shift \|\| "."; find sub { return unless -f; $access{$File::Find::dir}{-A _ > 180 ? 'old' : 'new'}++; #date }, $top; # put your topdirs here for (sort keys %access) { if ($access{$_}{old} > $access{$_}{new}) { print "$_ has more old than new\n"; } } [download] A few problems remain. 1)This script needs to output to a file. The tree it will be searching contains thousands of directories; which would make that "print" scroll off the top. 2)There is a problem with the line `($ages{$_}{old} > $ages{$_}{new}) {` I get "use of unintialized value in numeric gt at ..." message from the interperter. However, I am many hours closer to a working script than I was. So: Thanks	[reply] [d/l] [select]
Re: Re: recursive directory question by jarich (Curate) on Jul 09, 2002 at 01:02 UTC
# Usage: find_archive [directory 1] [directory 2] ... [directory n] use File::Find; # file's date of last access is determining factor my %access; #$access{$dirname}{old}, $access{$dirname}{new} my @top_dirs = @ARGV; # allows us to search over mul +tiple trees # eg /home/jarich/ /home/aquil +o/ if # specified on command line push @top_dirs, "." unless @top_dirs; # default behaviour my $results_file = "./archive_results.txt"; find sub { return unless -f; $access{$File::Find::dir}{-A _ > 180 ? 'old' : 'new'}++; }, @top_dirs; my @results; for (sort keys %access) { $access{$_}{old} \|\|= 0; # defaults to avoid warnings $access{$_}{new} \|\|= 0; if ($access{$_}{old} > $access{$_}{new}) { push @results, "$_"; # keep results rather than pri +nting } } # Dump all to file. open RESULTS, "> $results_file" or die "Failed to open $results_file for writing: $!\n +"; print RESULTS join("\n", @results), "\n"; close RESULTS; [download] Update: Added $! to open statement and extra newline.	[reply] [d/l]
•Re: Re: Re: recursive directory question by merlyn (Sage) on Jul 09, 2002 at 01:07 UTC
`open RESULTS, "> $results_file" or die "Failed to open $results_file for writing\n"; print RESULTS join("\n", @results); close RESULTS;` [download] That's missing the final newline (a common mistake). Perhaps you wanted: `open RESULTS, ">$results_file" or die; print RESULTS "$_\n" for @results; close RESULTS;` [download] And as for: `$access{$_}{old} \|\|= 0; # defaults to avoid warnings $access{$_}{new} \|\|= 0; if ($access{$_}{old} > $access{$_}{new}) { push @results, "$_"; # keep results rather than pri +nting }` [download] I'd recast that as: `if (($access{$_}{old} \|\| 0) > ($access{$_}{new} \|\| 0)) { push @results, "$_"; # keep results rather than pri +nting }` [download] mostly because I hate changing values when all I'm really trying to do is test them. One of the many reasons why I'm not religious about enabling `-w`. -- Randal L. Schwartz, Perl hacker	[reply] [d/l] [select]
Re: recursive directory question by gooch (Monk) on Sep 27, 2002 at 21:08 UTC
Granted significant time has passed on this thread, but I figured it couldn't hurt to note that the code provided by jarich will run just fine under Win32, not just under unix. FWIW -M Gucciard	[reply]