in reply to Re^2: File:Find pattern match question
in thread File:Find pattern match question

"We do have some duplicate directories so finding dupes is the next problem."

If you change this line in ++Athanasius' code:

$dirs{$File::Find::dir} = 1;

to

++$dirs{$File::Find::dir};

The code will run the same but now you'll have a count. You can then find duplicates like this (untested):

my @dup_dirs = grep { $dirs{$_} > 1 } keys %dirs;

-- Ken

Replies are listed 'Best First'.
Re^4: File:Find pattern match question
by RockE (Novice) on Nov 01, 2013 at 01:00 UTC

    Hi Ken I tried your example but it wouldn't output anything. If I removed the pattern match string it does work but I'm not sure what it is printing out - the directory listing seems random.

    #!/usr/bin/perl # dirpathdupes use strict; use warnings; use File::Find; use Fcntl; #*****************Path Variables********************** our $wellpath = 'N:\\repos\\open\\Wells\\Regulated\\'; our $surveypath = 'N:\\repos\\open\\Surveys\\Regulated\\'; our $testpath = 'C:\\Temp\\'; #******************************************************* my %dirs; find(\&dir_names, $testpath); my @dup_dirs = grep { $dirs{$_} > 1 } keys %dirs; #print "$_\n" for sort keys %dirs; foreach my $l (@dup_dirs) { print "$l\n"; } sub dir_names { # skip over everything that is not a directory return unless -d $File::Find::name; # skip over directories that don't match required pattern return unless $File::Find::dir =~ /[IPD]\d{8}$/; ++$dirs{$File::Find::dir}; }

      "Hi Ken I tried your example but it wouldn't output anything."

      The technique I showed should work fine. In his response below, Athanasius has highlighted the issue with your original premise (i.e. parent vs. current directory). You can still use the technique I provided, you'll just need to work it into the code fix he's shown.

      "If I removed the pattern match string it does work but I'm not sure what it is printing out - the directory listing seems random."

      Hashes are unordered: keys %hash_name will return a list of keys in an apparently random order. If you're interested, see the "Hash Algorithm" section of "perlsec: Algorithmic Complexity Attacks" for more details.

      sort may provide the ordering you want. If not, you may want to consider an array, or perhaps a more complex data structure, instead of a hash, to store your data. See "perldsc - Perl Data Structures Cookbook".

      -- Ken

        Thanks again Ken, I have the following code, I do need to work on it a bit, at the moment it's printing out C:\Temp and nothing else. Athanasius code is printing out each of the P00 and I00 directories I made under C:\Temp for testing. I'll check out your example further, thanks

        #!/usr/bin/perl # dirpathdupes use strict; use warnings; use File::Find; use Fcntl; #*****************Path Variables********************** our $wellpath = 'N:\\repos\\open\\Wells\\Regulated\\'; our $surveypath = 'N:\\repos\\open\\Surveys\\Regulated\\'; our $testpath = 'C:\\Temp\\'; #******************************************************* my %dirs; find(\&dir_names, $testpath); my @dup_dirs = grep { $dirs{$_} > 1 } keys %dirs; foreach my $l (@dup_dirs) { print "$l\n"; } sub dir_names { return unless -d; return unless /[IPD]\d{8}$/; ++$dirs{$File::Find::dir};