nemesdani has asked for the wisdom of the Perl Monks concerning the following question:

Laudetur, monks. My problem is: In a given tree of subdirs, on the lowest level there are files, on other levels there aren't. In every subdir on the lowest level I'd like to select a certain file (newest from a certain type) and not process the others.

Now I've written a script, using File::find, which uses last to skip checking other files once I found one, thus skipping to the next directory. I think there are more elegant and/or efficient ways to this, I don't like using last.

Please share your thoughts. Thank you.

use strict; use warnings; use File::Find; my $dir = "d:\\Teszt"; finddepth(\&gotcha, $dir); sub gotcha { if ($dir eq $File::Find::dir){ print "same dir, skipping\n"; last; } else {$dir = $File::Find::dir;} if (-f) {print "$File::Find::name\n"; &process_files_in_subdir() } } sub process_files_in_subdir { ; #whatever }

Replies are listed 'Best First'.
Re: File::find hack
by JavaFan (Canon) on Feb 25, 2012 at 11:45 UTC
    If you're looking for a way to find the "newest" file, without looking at all the files (or at least, their metadata), there isn't.

    You could do something like (untested):

    use File::Spec; my %cache; foreach my $file (`find d:\\Teszt -type f`) { my $age = -M $file; my ($vol, $dir, $name) = File::Spec->splitpath($file); if (exists $cache{$vol, $dir}) { next if $age > $cache{$vol,$dir}[1]; } $cache{$vol,$dir} = [$file, $age]; } foreach my $newest_file_in_directory (map {$$_[0]} values %cache) { ... process file ... }
    I'm using `find` because I've never had a need to learn the File::Find syntax.
      Thank you for the answer. However maybe I wasn't clear enough. I know that once I reached the subdir, I have to check all the files there. I just don't like using last.

        I just don't like using last.

        As well you shouldn't, at least in this case. 'last' is not a correct way of exiting a subroutine, and Perl should actually throw a warning if you run your code. The correct way of coming back from a sub is 'return', not 'last'.

        -- 
        I hate storms, but calms undermine my spirits.
         -- Bernard Moitessier, "The Long Way"
Re: File::find hack
by oko1 (Deacon) on Feb 25, 2012 at 16:55 UTC

    Not especially elegant - I only strive for grace, not necessarily achieve it :) - but here's something that I think is a bit cleaner and more direct:

    #!/usr/bin/perl use common::sense; use File::Find; my $dir = "/some/dir/some/where"; my %dirs; finddepth(\&newest, $dir); sub newest { return unless -f; my ($d, $n) = ($File::Find::dir, $File::Find::name); $dirs{$d} ||= $n and return; $dirs{$d} = $n if (stat($n))[9] > (stat($dirs{$d}))[9]; } print "$_: $dirs{$_}\n" for keys %dirs;
    -- 
    I hate storms, but calms undermine my spirits.
     -- Bernard Moitessier, "The Long Way"
      Cool! Thank you very much! Had to think for awhile what ||= does, I'm not an expert as you can clearly see...:)
Re: File::find hack
by Marshall (Canon) on Feb 26, 2012 at 00:07 UTC
    I tried a different approach. Unfortunately it didn't work out as well as I would have hoped - but I'd never played with the preprocess option before. However along the way, I found out the finddepth() entry point or bydepth=>1 option didn't work on my Windows XP machine. So mileage does vary on this point! Thought I'd pass that tid-bit along since you appear to be on Windows.

    I also found out that this pre-process option as well as ordering/filtering files, can also evidently be used to "prune" off sections of the tree to excluded from further searching. It modifies the output of readdir(), but before calling the wanted() subroutine - so if a directory name is filtered out, it won't follow that branch.

    Also on Windows NTFS, I am unsure whether this "atime" last access time is really available or not. "mtime" (-M) or last modified time is portable between Unix and Windows. Also, ISO-9660 (CD-ROM) can be problematic if you want the code to work on that file system also. But I think that -M works on all of the these.

    Overall, I'm not too happy with this approach - if the bydepth option had worked - the code would be shorter - but it does illustrate some points that may be useful on another project and so, I'll pass it along...

    #!/usr/bin/perl -w use strict; use File::Find; my @dirs = ('C:/temp'); my %options = (preprocess =>\&newest, wanted=>\&wanted, bydepth=>1); # bydepth didn't work on Windows XP!! find (\%options, @dirs); sub newest { # print "processing $File::Find::dir for ".@_." entries\n"; # takes a list and returns a list of directories/files for # further processing.. my $newest_file; my @files; foreach (@_) { if (!-f) {push @files, $_} # links,directories elsif (/\.txt$/) { $newest_file ||= $_ and next; $newest_file = $_ if (-M > -M $newest_file); } } unshift (@files, $newest_file) if ($newest_file); return(@files); } sub wanted { return if -d; print "$File::Find::name\n"; #only one "newest .txt file!" } __END__ Newest .txt file in each directory... C:/temp/10.txt C:/temp/inline/_Inline/build/_24to32/Comments.txt C:/temp/Martin-GermanSchoolProject/output.txt C:/temp/students/debug_notes.txt
    Update: Performance notes: When Perl does a file test operation, it caches the results of the stat() operation. The special file testing variable, "_", just plain old underscore, not $_ in the second version says to just access this cached value from the -f file test previously. And of course since a file test operation is "expensive" if the program cached the -M $newest_file value, it would be faster. None of this usually matters, but once you get to say 1,000 files, it will start "mattering".
    $newest_file = $_ if (-M > -M $newest_file);
    $newest_file = $_ if (-M _ > -M $newest_file);