in reply to Re^3: Trying to understand hashes (in general)
in thread Trying to understand hashes (in general)

I have 20 folders in one directory ( we will call dir "a"), that contains sub dirs with about 1880 files in them and 20 folders in another directory(and this one "b") with close to the same amount of sub dirs and files. I need to loop thru both sets of directories (a and b) and see if any file/path name in (a) is in (b). that is where I am at. I am able to do it pretty easy with arrays, I just thought it would be faster to use hashes I reckon.
  • Comment on Re^4: Trying to understand hashes (in general)

Replies are listed 'Best First'.
Re^5: Trying to understand hashes (in general)
by hexcoder (Curate) on Dec 23, 2014 at 16:26 UTC
    Yes, i agree, hashes are optimal for fast lookup tasks. Here is a minimal script to answer if a file from (a) is in (b). It needs to be started in the directory where 'a' and 'b' reside.
    use strict; use warnings; use File::Find; # for traversing directories with find() my %b_paths; # hash holding relative paths of files of the 'b' directo +ry my @dirs = ('a', 'b'); # directories to investigate my $dirselect = 1; # use second @dirs entry # Populate the hash with paths from the second @dirs entry directory. # Use the relative paths as keys and 'undef' as value. chdir($dirs[$dirselect]); # traverse the directory tree find(sub { return if (!-f $_); # ignore directories and other non regular fil +es # register the relative path in the hash # see File::Find documentation for $File::Find::dir and $_ # here name of current directory is $File::Find::dir # and name of directory entry is $_ $b_paths{"$File::Find::dir/$_"} = undef; }, '.'); chdir('..'); # switch to the first @dirs entry $dirselect = 0; chdir($dirs[$dirselect]); # For each path in the directory check, # whether it is also present in the 'b' directory. # This is done with a fast lookup in the hash %b_paths (set lookup) # Print the result along with the path # traverse the directory tree find(sub { return if (!-f $_); # ignore directories and other non regular fil +es my $p = "$File::Find::dir/$_"; my $exists = exists $b_paths{$p} ? '' : ' not'; $p =~ s{^\.}{$dirs[$dirselect]}xms; # change the leading '.' for t +he base directory name print "Path $p does$exists exist in the '$dirs[1-$dirselect]' dire +ctory\n"; }, '.'); chdir('..');
    Here is a script version that additionally shows what files from (b) are unique (that is are not in (a)):
    use strict; use warnings; use File::Find; my %a_paths; # hash holding relative paths of files of the 'a' directo +ry my %b_paths; # hash holding relative paths of files of the 'b' directo +ry my @dirs = (['a', \%a_paths], ['b', \%b_paths]); my $dirselect = 1; # Populate the hash with paths from the second @dirs entry directory. # Use the relative paths as keys and 'undef' as value. chdir($dirs[$dirselect]->[0]); find(sub { return if (!-f $_); # ignore directories and other non regular fil +es # register the relative paths in the hash # see File::Find documentation for $File::Find::dir and $_ # here name of current directory is $File::Find::dir # and name of directory entry is $_ $dirs[$dirselect]->[1]->{"$File::Find::dir/$_"} = undef; }, '.'); chdir('..'); # change to the parallel first @dirs entry $dirselect = 0; chdir($dirs[$dirselect]->[0]); # For each path in the directory check, # whether it is also present in the second directory. # This is done with an lookup in the hash of the other entry(set looku +p) # Print the result along with the path # Populate the hash with paths from the directory. # Use the relative paths as keys and 'undef' as value. find(sub { return if (!-f $_); # ignore directories and other non regular fil +es my $p = "$File::Find::dir/$_"; # register the relative paths in the hash $dirs[$dirselect]->[1]->{$p} = undef; my $exists = exists $b_paths{$p} ? '' : ' not'; $p =~ s{^\.}{$dirs[$dirselect]->[0]}xms; # change the leading '.' +for the base directory name print "Path $p does$exists exist in the '$dirs[1-$dirselect]->[0]' + directory\n"; }, '.'); chdir('..'); $dirselect = 1; # now find all entries in the second directory, that are unique for my $p (keys %{$dirs[$dirselect]->[1]}) { if (!exists $dirs[1-$dirselect]->[1]->{$p}) { $p =~ s{^\.}{$dirs[$dirselect]->[0]}xms; # change the leading +'.' for the base directory name print "Path $p does not exist in the '$dirs[1-$dirselect]->[0] +' directory\n"; } }
    It looks a bit more abstract because i wanted only one location with the definition of the base directories and their hashes (@dirs). The benefit is that the following code has no hard coded dependencies for them.

    Have a nice Xmas...