Aquilo has asked for the wisdom of the Perl Monks concerning the following question:

I *just* started using perl but I trying to learn. Explainations of what is wrong are more useful than simple fixing the problem for me.

I have seen many posts about proforming and function on every file in a directory but have seen none about working on the directory level.

I need a script that:
(UNIX only) Recurses through a directory structure and checks if more than half of the file in that directory have been used in the past 180 days. The path of directory which are predominately unused is apprended to a list of directory which will be used to archive them.

There area I'm most concerned with is my recurse functions itself. And if there is a more efficent way of doing this please explain that as well.

#!usr/bin/perl # Author: Ryan Scadlock # Date: 6 July 2002 use strict; my $path = './'; die "The file $base_file does not exist!\n" if (!-f $base_file); # Initiate the recursion &RecurseDirs($path); rm temp; print "The result can be found at Results file in this dir"; #### SUBROUTINES SECTION #### # Function that recurses through the directory tree sub RecurseDirs { my ($path) = @_; my $file; #Variable for a file foreach $path($path){ opendir (DIRECTORY, $path) || die "Can't read $path\n"; if (-d "$path$file/") { #If it's a directory... # Recurse again through this directory &RecurseDirs("$path$file/"); my $unused = 0; #Counter for how many files are accessed my $count = 0; #Counter for how many files are in dir # Count unused files in dir $path find -type f -atime +180 >temp.txt; $str = wc temp -l; $unused = int::substr($str, 0, 1); #returns the first characte +r as an int # Count files in dir $path find -type f >temp.txt; $str = wc temp -l; $count = int::substr($str, 0, 1); #returns the first character + as an int # Compare # Yes: write full dir path to file if (($unused * 2) > $count){ $path >>result.txt; } } closedir (DIRECTORY); } }

Replies are listed 'Best First'.
•Re: recursive directory question
by merlyn (Sage) on Jul 08, 2002 at 04:38 UTC
    use File::Find; my %ages; # $ages{$dirname}{old}, $ages{$dirname}{new} find sub { return unless -f; $ages{$File::Find::dir}{-M _ > 180 ? 'old' : 'new'}++; }, "."; # put your topdirs here for (sort keys %ages) { if ($ages{$_}{old} > $ages{$_}{new}) { print "$_ has more old than new\n"; } }

    -- Randal L. Schwartz, Perl hacker


    update: Hey, that was actually useful! I found three directories I could archive! Neat! Thanks for the idea. I'll put it into a mini-snippets article for one of my upcoming column articles!

      Trivial improvement: stick my $top = shift || "."; before the find and change the "." to $top and it's even more useful.

      I have a question - What does the -M _ do? I understand that the -M returns the modification time (since script start) but what is the _ ? I see this in the File::Find man page but I don't have a good grip on what is going on. Does it mean $_ ? If so why not use $_ ?

      Anyway, nice solution.

      --

      flounder

        The underscore filehandle (one of the few features of Perl to which I can claim to have introduced), means "don't actually perform a stat, but use the information cached from the most recent other stat". Well, the docs in perlfunc say it better:
        If any of the file tests (or either the "stat" or "lstat" operators) are given the special filehan- dle consisting of a solitary underline, then the stat structure of the previous file test (or stat operator) is used, saving a system call. (This doesn't work with "-t", and you need to remember that lstat() and "-l" will leave values in the stat structure for the symbolic link, not the real file.) (Also, if the stat buffer was filled by a "lstat" call, "-T" and "-B" will reset it with the results of "stat _"). Example: print "Can do.\n" if -r $a || -w _ || -x _; stat($filename); print "Readable\n" if -r _; print "Writable\n" if -w _; print "Executable\n" if -x _; print "Setuid\n" if -u _; print "Setgid\n" if -g _; print "Sticky\n" if -k _; print "Text\n" if -T _; print "Binary\n" if -B _;

        -- Randal L. Schwartz, Perl hacker

Re: recursive directory question
by jarich (Curate) on Jul 08, 2002 at 08:17 UTC
    G'day Aquilo

    In the interests of explaining what is wrong with your code I've written a point by point summary of various lines of your code. Like merlyn I like the problem, and I hope that my advice here may be of help. I must say though that merlyn's solution or any other solution that uses File::Find is guaranteed to be better than the improvements I suggest to your code.

    Of course, doing it all by hand is nowhere near as fast as getting that lovely module File::Find to do it for you, so use merlyn's solution in preference to this.

    Hope it helps.

    jarich

    Update: Added test for symbolic link, as per Aristotle's reminder. I'm sure there are more gotchas here.

      Careful.. you do not test for symlinks. Do an ln -s . snaretrap and watch the code recurse forever. That is another (and more important) reason to rely on File::Find - it correctly treats all the gotchas that can come up when traversing directories.

      Makeshifts last the longest.

Re: recursive directory question
by Aquilo (Initiate) on Jul 08, 2002 at 16:31 UTC

    First, thanks for all the suggestions and comments. Particularly merlyn for his File::Find solution and jarich for his excellent explanation of my code and it's errors

    This is now the most up to date version of the problem:

    #!/usr/bin/perl -w # Author: Ryan Scadlock with significant help # from Perlmonks merlyn, jarich, and Fletch # Date: 6 July 2002 # Description: (UNIX only) Recurses through a directory structure and # checks if more than half of the file in that directory have been use +d # in the past 180 days. The path of diectory which are predominately # unused is apprended to a list. use File::Find; # file's date of last access is determining factor my %access; #$access{$dirname}{old}, $access{$dirname}{new} my $top = shift || "."; find sub { return unless -f; $access{$File::Find::dir}{-A _ > 180 ? 'old' : 'new'}++; #date }, $top; # put your topdirs here for (sort keys %access) { if ($access{$_}{old} > $access{$_}{new}) { print "$_ has more old than new\n"; } }

    A few problems remain. 1)This script needs to output to a file. The tree it will be searching contains thousands of directories; which would make that "print" scroll off the top. 2)There is a problem with the line
    ($ages{$_}{old} > $ages{$_}{new}) {
    I get "use of unintialized value in numeric gt at ..." message from the interperter.
    However, I am many hours closer to a working script than I was. So: Thanks

      # Usage: find_archive [directory 1] [directory 2] ... [directory n] use File::Find; # file's date of last access is determining factor my %access; #$access{$dirname}{old}, $access{$dirname}{new} my @top_dirs = @ARGV; # allows us to search over mul +tiple trees # eg /home/jarich/ /home/aquil +o/ if # specified on command line push @top_dirs, "." unless @top_dirs; # default behaviour my $results_file = "./archive_results.txt"; find sub { return unless -f; $access{$File::Find::dir}{-A _ > 180 ? 'old' : 'new'}++; }, @top_dirs; my @results; for (sort keys %access) { $access{$_}{old} ||= 0; # defaults to avoid warnings $access{$_}{new} ||= 0; if ($access{$_}{old} > $access{$_}{new}) { push @results, "$_"; # keep results rather than pri +nting } } # Dump all to file. open RESULTS, "> $results_file" or die "Failed to open $results_file for writing: $!\n +"; print RESULTS join("\n", @results), "\n"; close RESULTS;

      Update: Added $! to open statement and extra newline.

        open RESULTS, "> $results_file" or die "Failed to open $results_file for writing\n"; print RESULTS join("\n", @results); close RESULTS;
        That's missing the final newline (a common mistake). Perhaps you wanted:
        open RESULTS, ">$results_file" or die; print RESULTS "$_\n" for @results; close RESULTS;
        And as for:
        $access{$_}{old} ||= 0; # defaults to avoid warnings $access{$_}{new} ||= 0; if ($access{$_}{old} > $access{$_}{new}) { push @results, "$_"; # keep results rather than pri +nting }
        I'd recast that as:
        if (($access{$_}{old} || 0) > ($access{$_}{new} || 0)) { push @results, "$_"; # keep results rather than pri +nting }
        mostly because I hate changing values when all I'm really trying to do is test them. One of the many reasons why I'm not religious about enabling -w.

        -- Randal L. Schwartz, Perl hacker

Re: recursive directory question
by gooch (Monk) on Sep 27, 2002 at 21:08 UTC
    Granted significant time has passed on this thread, but I figured it couldn't hurt to note that the code provided by jarich will run just fine under Win32, not just under unix.
    FWIW
    -M Gucciard