in reply to Re: Trying to understand hashes (in general)
in thread Trying to understand hashes (in general)

It just seems that if I have a huge array of filenames and directories, and I want to compare another huge list of filenames and directories, it takes a long time with an array. Say for instance if I have a list of 2000 elements, and I want to compare that to another array that is 2000 elements, thats 4,000,000 iterations thru loops. Would hashes not be better for such a look up method? Maybe I am just putting to much emphasis on it, and the way I did it before is fine.

There is still a good bit that I do not understand about hashes and seeing 10 different way on how to build them kind of confuse me. ;) But the code I posted above, why is it adding $file to both keys and values of the hash?

Thanks for commenting btw :)
  • Comment on Re^2: Trying to understand hashes (in general)

Replies are listed 'Best First'.
Re^3: Trying to understand hashes (in general)
by GrandFather (Saint) on Dec 23, 2014 at 06:42 UTC

    What are you trying to achieve with the compare? Depending on the answer an array, a hash or a database may be a good answer, or maybe you don't need to store anything at all. In no case should you need nested loops that run across all combinations of element pairs however.

    There is no "one best solution" for all problems. Having a good understanding of what you are trying to achieve very often will point you toward the correct data structure and once you have the data structure right very often everything else just slots into place around it.

    Perl is the programming world's equivalent of English
      I have 20 folders in one directory ( we will call dir "a"), that contains sub dirs with about 1880 files in them and 20 folders in another directory(and this one "b") with close to the same amount of sub dirs and files. I need to loop thru both sets of directories (a and b) and see if any file/path name in (a) is in (b). that is where I am at. I am able to do it pretty easy with arrays, I just thought it would be faster to use hashes I reckon.
        Yes, i agree, hashes are optimal for fast lookup tasks. Here is a minimal script to answer if a file from (a) is in (b). It needs to be started in the directory where 'a' and 'b' reside.
        use strict; use warnings; use File::Find; # for traversing directories with find() my %b_paths; # hash holding relative paths of files of the 'b' directo +ry my @dirs = ('a', 'b'); # directories to investigate my $dirselect = 1; # use second @dirs entry # Populate the hash with paths from the second @dirs entry directory. # Use the relative paths as keys and 'undef' as value. chdir($dirs[$dirselect]); # traverse the directory tree find(sub { return if (!-f $_); # ignore directories and other non regular fil +es # register the relative path in the hash # see File::Find documentation for $File::Find::dir and $_ # here name of current directory is $File::Find::dir # and name of directory entry is $_ $b_paths{"$File::Find::dir/$_"} = undef; }, '.'); chdir('..'); # switch to the first @dirs entry $dirselect = 0; chdir($dirs[$dirselect]); # For each path in the directory check, # whether it is also present in the 'b' directory. # This is done with a fast lookup in the hash %b_paths (set lookup) # Print the result along with the path # traverse the directory tree find(sub { return if (!-f $_); # ignore directories and other non regular fil +es my $p = "$File::Find::dir/$_"; my $exists = exists $b_paths{$p} ? '' : ' not'; $p =~ s{^\.}{$dirs[$dirselect]}xms; # change the leading '.' for t +he base directory name print "Path $p does$exists exist in the '$dirs[1-$dirselect]' dire +ctory\n"; }, '.'); chdir('..');
        Here is a script version that additionally shows what files from (b) are unique (that is are not in (a)):
        use strict; use warnings; use File::Find; my %a_paths; # hash holding relative paths of files of the 'a' directo +ry my %b_paths; # hash holding relative paths of files of the 'b' directo +ry my @dirs = (['a', \%a_paths], ['b', \%b_paths]); my $dirselect = 1; # Populate the hash with paths from the second @dirs entry directory. # Use the relative paths as keys and 'undef' as value. chdir($dirs[$dirselect]->[0]); find(sub { return if (!-f $_); # ignore directories and other non regular fil +es # register the relative paths in the hash # see File::Find documentation for $File::Find::dir and $_ # here name of current directory is $File::Find::dir # and name of directory entry is $_ $dirs[$dirselect]->[1]->{"$File::Find::dir/$_"} = undef; }, '.'); chdir('..'); # change to the parallel first @dirs entry $dirselect = 0; chdir($dirs[$dirselect]->[0]); # For each path in the directory check, # whether it is also present in the second directory. # This is done with an lookup in the hash of the other entry(set looku +p) # Print the result along with the path # Populate the hash with paths from the directory. # Use the relative paths as keys and 'undef' as value. find(sub { return if (!-f $_); # ignore directories and other non regular fil +es my $p = "$File::Find::dir/$_"; # register the relative paths in the hash $dirs[$dirselect]->[1]->{$p} = undef; my $exists = exists $b_paths{$p} ? '' : ' not'; $p =~ s{^\.}{$dirs[$dirselect]->[0]}xms; # change the leading '.' +for the base directory name print "Path $p does$exists exist in the '$dirs[1-$dirselect]->[0]' + directory\n"; }, '.'); chdir('..'); $dirselect = 1; # now find all entries in the second directory, that are unique for my $p (keys %{$dirs[$dirselect]->[1]}) { if (!exists $dirs[1-$dirselect]->[1]->{$p}) { $p =~ s{^\.}{$dirs[$dirselect]->[0]}xms; # change the leading +'.' for the base directory name print "Path $p does not exist in the '$dirs[1-$dirselect]->[0] +' directory\n"; } }
        It looks a bit more abstract because i wanted only one location with the definition of the base directories and their hashes (@dirs). The benefit is that the following code has no hard coded dependencies for them.

        Have a nice Xmas...