dmtelf has asked for the wisdom of the Perl Monks concerning the following question:

I've got an array of several hundred filenames (complete with paths etc).

I need to traverse a directory (+ subdirs) and weed out all files that are in the directory structure but not in the array of files to keep.

Another way of expressing this would be, how can I do a directory difference, between a directory structure and an array of files.

I've thought of playing around with file::find, building arrays of the files on the disc, and doing an array comparision with the array of files to keep to get an array that holds all files that are on the HD but not in the files to keep array.

Or is there a better way of doing this?

Additionally, I could save the list of extraneous files by saving the extranous files array to disc. I could then loop through this and move all these files to a "trash" directory. At a later date, I could iterate through the trash directory and optionally restore any files in the trash directory to their previous place in the directory tree.

Any ideas?

thanks for your help, wisdom and enlightenment, o Monks.

dmtelf

Replies are listed 'Best First'.
Re: Hard disc pruner
by lhoward (Vicar) on Jul 07, 2000 at 17:23 UTC
    I'd definitely do it with File::Find. Since you already have an array of files to keep, I'd convert this to a hash (paths/names to keep as they keys). Then use File::Find to loop through the filesystem and check to see if each file that File::Find locates is in the hash. If it is not in the hash, delete it.

    Here's a simple codeup of my recomendation.... You should test it THOUROUGHLY before uncommenting the unlink line..

    #!/usr/bin/perl -w use strict; use File::Find; my %keep; # load %keep with the filenames you want to keep... # $keep{/bin/sh}=1; # $keep{/sbin/ifconfig}=1; # etc.... find(\&checkfile,'/'); sub checkfile{ my $fname="$File::Find::dir/$_"; if(!defined $keep{$fname}){ print "deleting $fname\n"; # unlink $fname; } }
      Danger, Will Robinson, Danger! I really hope there are no directory names in that list of yours. Unlinking directories is usually a very bad idea - problems that even fsck has a hard time fixing can result.

      Having said that and realizing the caffiene/sleep ratio is far too high, lets have some fun. Can we do this in one line? Why, yes!

      @dead = grep { ! defined( $keep{$_} ) && ( -d $file ? rmdir : unlink ) && $_ } @files;
      which will not call unlink on directories and return a list of the files deleted. It does assume that you have done a depth-first walk to generate the file names though. Now, this is useful, but I could generate the "expected to be blown away" list easier. What I would be interested in is the files that didn't get blown away.
      @dead = grep { ! defined( $keep{$_} ) && ( ( -d $file ? rmdir : unlink ) || $_ ) } @files;
      Well this is cute, but why couldn't we blow the file away?
      @dead = grep { ! defined( $keep{$_} ) && ( ( -d $file ? rmdir : unlink ) || "$_:$!" ) } @files;
      On second thought, follow lhoward's suggestion - this code is dangerous and will likely get you cursed by anybody having to maintain it.

      Mik Firestone ( perlus bigotus maximus )