niceguy has asked for the wisdom of the Perl Monks concerning the following question:

Dear PerlMonks,

I am relatively new with Perl. I am trying to delete some old files from a directory based on, if it is not listed in the file and this file has a SDF file extension.

I have an array (@file) that contains a list of all the file name in the directory. And another array (@name) contains a list of all the lines from the SDF file.

What I like to do is to compare the two arrays and find the file name that is not listed in the SDF file.

The SDF file contains lines like:

I've been searching the web and found some good examples but not exactly on what I am trying to do. I have tried using "!($name =~ $filename)", it just delete all the files in the directory. I think there is something wrong with the logic in the loop but not sure where. Below, is my code that are able to find the matching files. I am wondering if you can help.

Thank you in advance for your help.

#!\perl\bin\perl use strict; use warnings; my $files = "C:\\Directory"; my $list = "C:\\Test.sdf"; my @name; open(my $name, "< $list") or die "Failed to open file: $!\n"; while(<$name>) { chomp; push @name, $_; } close $name; my @file = $files; opendir(OUTPUT, $files); @file = readdir(OUTPUT); closedir(OUTPUT); foreach my $filename (@file) { foreach $name (@name) { if ($name =~ $filename) { last; }else { unlink ($files . "\\" . $filename) or warn qq{cannot delete $fil +ename: $!+}; last; } } }

Replies are listed 'Best First'.
Re: Compare 2 arrays
by GrandFather (Saint) on Jun 28, 2016 at 00:00 UTC

    First off, take a hard look at your nested loop. It may help to rewrite it a little to see what's going on:

    foreach $name (@name) { last if ($name =~ $filename); unlink($files . "\\" . $filename) or warn qq{cannot delete $filename: $!+}; last; }

    which is the same as:

    if ($name[0] !~ $filename) { unlink($files . "\\" . $filename) or warn qq{cannot delete $filename: $!+}; }

    because the two uses of last mean that you only ever get 1 iteration of the nested loop. That code ensures you delete the current file unless the first name happens to match the current file name.

    However when you want to answer a "is it in the set" question use a hash. Consider:

    #!\perl\bin\perl use strict; use warnings; my $files = "C:\\Directory"; my $list = "C:\\Test.sdf"; my %keepList; open my $namesIn, '<', $list or die "Failed to open file: $!\n"; while (<$namesIn>) { chomp; $keepList{$_} = 1; } close $namesIn; opendir my ($filesScan), $files; while (my $filename = readdir $filesScan) { next if exists $keepList{$filename}; unlink "$files\\$filename" or warn qq{cannot delete $filename: $!+ +}; } closedir $filesScan;

    Note that I haven't tested the code!

    Premature optimization is the root of all job security
      I liked your code. A few suggestions, but also not tested!

      • Use "/" instead of "\\" for windows file paths
      • I think some extra filtering is necessary on the $namesIn file?
      • for opendir, add a die message if fails
      • for readdir, skip all but simple files - maybe here makes no difference, but I would do it.
      Perhaps something like this???
      #!usr/bin/perl use strict; use warnings; my $files = "C:/Directory"; my $list = "C:/Test.sdf"; my %keepList; open my $namesIn, '<', $list or die "Failed to open file: $!\n"; # minor updates to this loop ##################### while (my $line = <$namesIn>) #updated ####### { my $sdf_file; #next unless ($sdf_file) = $line =~ /(\w+\.nfo)"$/; #this regex should work equally well next unless ($sdf_file) = $line =~ /(\w+\.nfo)/; $keepList{$sdf_file} = 1; print "keeping $sdf_file\n"; #update for debugging ####### } close $namesIn; opendir my ($filesScan), $files or die "unable to open dir $!"; while (my $filename = readdir $filesScan) { next unless -f "$files/$filename"; #only simple files allowed #skip . and .. or other dirs next if exists $keepList{$filename}; unlink "$files/$filename" or warn qq{cannot delete $filename: $!+} +; } closedir $filesScan;

        Hi Marshall,

        Thank you for your responds and your suggestions. Unfortunately your codes also delete all the files in the directory.

        Please let me know if you have other ideas.

      Hi GrandFather,

      Thank you for your responds and your suggested codes but your codes also deletes all the files in the directory.

      Do you have any other ideas?

        It may be that your file lists files to be removed where I assumed it listed files to be kept. You can tell that from the name of the keepList variable. You simply need to change keepList to removeList and change the sense of the exists test.

        Look at the code. Think about the code. Read the documentation for anything you don't understand. If you still can't figure it out, come back and ask about the elements of the code you don't understand.

        Premature optimization is the root of all job security
Re: Compare 2 arrays
by Anonymous Monk on Jun 28, 2016 at 00:31 UTC

    Hi, :) readdir is no fun at all, Path::Tiny on the other hand is fun :)

    use Path::Tiny qw/ path /; my @fyles = path( $files )->children; my @ufos = path( $files )->children( qr{\.nfo$} );
Re: Compare 2 arrays
by stevieb (Canon) on Jun 28, 2016 at 16:41 UTC

    The following works on Windows (it will not work on *nix).

    It reads all entries in the SDF file into an array, after transforming the backslashes to forward slashes, and lowercase the whole shebang.

    Then after we've grabbed up all *.sdf files in the specified directory, we lowercase these as well. On Windows, case is irrelevant on the file system, and this will avoid breaking exact matching if case differs.

    If the file is not in the SDF file, we delete (unlink) it.

    use warnings; use strict; use File::Find::Rule; my $path = 'c:/sdf'; # path to look in my $sdf_file = 'c:/sdf_file.txt'; open my $fh, '<', $sdf_file or die "can't open the flippin' flackin' file!: $!"; my @sdf_files; while (<$fh>){ chomp; if (my ($file) = /fullpath="(.*)"$/){ # replace backslash to fwd slash, and lowercase $file =~ s|\\|/|g; $file = lc $file; push @sdf_files, $file; } } my @files = File::Find::Rule->file() ->name('*.sdf') ->in($path); for my $file (@files){ $file = lc $file; if (! grep {$file eq $_} @sdf_files){ print "deleting $file\n"; unlink $file or die $!; } }

      Hi stevieb,

      Thank you for your responds. I want to apologize for not being clear on my environment. The directory contents only "*.nfo" files. Some are inactive, that is why I want to delete them. But in order to identify which one is inactive, I want to compare them to see if it is listed inside the "Test.sdf" file.

      I hope this will clarify the environment. Please let me know if you have any other suggestions.

        Please re-read my post at Re^2: Compare 2 arrays. This does take into account the *.nfo files in the SDF file. The code from stevieb can also be adjusted to do this. The Monks expect that you spend some time analyzing and understanding the code that is being written for you. You have a couple of approaches and both will work.

        Test in small increments. For example to parse the SDF file, you could break out my code into a short test program like this:

        #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my %keepList; while (my $line = <DATA>) { my $sdf_file; next unless ($sdf_file) = $line =~ /(\w+\.nfo)/; $keepList{$sdf_file} = 1; print "keeping $sdf_file\n"; #update for debugging ####### } =example printout keeping filename1.nfo keeping filename2.nfo =cut __DATA__ fullpath="C:\directory\filename1.nfo" id="1a" fullpath="C:\directory\filename2.nfo"
        As a note: If you are using Windows file names with a space in them, then the regex would be different. I only use filenames that are compatible with both Unix and Windows and that is probably the case here, but it may not be. One reason to run a simple test on the actual file!

        update: I should clarify, when you have a choice, use only [a-zA-Z0-9_], in the file names, basically anything that meets the rules of a valid identifier in Perl or C is fine, what Perl calls \w characters. Forgo using spaces or dashes in the names if you can and your life will be easier.