james28909 has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks! I am trying my best to understand exactly how to add just a key without a value to a hash. I understand very little when it comes to hashes, and I am trying to get away from using arrays, because it is to my understanding that finding a certain key in a hash is much quicker than finding an element in an array if you have a huge amount of elements.

So what i was planning on doing was migrating a few of my scripts to use hashes instead of arrays. Here is the code i have (which works) but i am uncertain as to how it works tbh.
use warnings; use strict; my $path = shift; opendir( my $DIR, $path ); my %dirs; while ( my $file = readdir($DIR) ) { next if ( $file eq '.' || $file eq '..' ); $dirs{$file} = $file; } print "$_\n" for sort keys %dirs; #print "$_\n" for sort values %dirs; #same as keys... #print "$_\n" for sort %dirs;
This will read a directory of course, and will print out that directories contents whether it be file or directory. Also, if you comment out the first print statement, and uncomment the last print statement, it will print duplicates and i am unsure as to why it is adding the filename to key and value of the hash.

Any insight into this would be very much appreciated and thanks.

EDIT: What I am going to eventually shoot for, is making a hash of hashes. and in each of these hash of hashes will be directories and filenames respectively, so any input on that would be appreciated as well. :)

Replies are listed 'Best First'.
Re: Trying to understand hashes (in general)
by GrandFather (Saint) on Dec 23, 2014 at 05:45 UTC

    Arrays are good for doing array stuff and hashes are good for doing hash stuff. If you set aside how they work under the hood arrays and hashes are nearly identical (in PHP essentially they are identical). The "difference" between arrays and hashes is that arrays are indexed by numbers and hashes are indexed by strings.

    Arrays are really good when you have a list of things you want to store and either they naturally are keyed by a number, or have no key but may be ordered. It's really fast to access element in an array by their index (position in the array). Perl arrays are also very efficient at adding and removing elements at the start and end of the array. Arrays tend to be a poor choice if there are large gaps where index values have been skipped.

    Hashes are really good where you want to access values by name. Note that there is no reason the name can't be a number so in that sense hashes and arrays can do the same job. Although hashes are pretty quick at looking up values by name, they aren't as fast as arrays. The other major difference is that hashes don't remember the order that elements were inserted so they generally can't be used in a trivial fashion to store ordered data.

    Unless you are generating a set of unique values by taking advantage of the fact that hash keys are unique, it seldom makes sense to use a hash to just store keys so on the face of it a hash doesn't make sense for storing a file system's structure because the file system doesn't allow duplicated names in any case. Nested arrays are a much better fit for a file system's structure.

    Perl is the programming world's equivalent of English
      It just seems that if I have a huge array of filenames and directories, and I want to compare another huge list of filenames and directories, it takes a long time with an array. Say for instance if I have a list of 2000 elements, and I want to compare that to another array that is 2000 elements, thats 4,000,000 iterations thru loops. Would hashes not be better for such a look up method? Maybe I am just putting to much emphasis on it, and the way I did it before is fine.

      There is still a good bit that I do not understand about hashes and seeing 10 different way on how to build them kind of confuse me. ;) But the code I posted above, why is it adding $file to both keys and values of the hash?

      Thanks for commenting btw :)

        What are you trying to achieve with the compare? Depending on the answer an array, a hash or a database may be a good answer, or maybe you don't need to store anything at all. In no case should you need nested loops that run across all combinations of element pairs however.

        There is no "one best solution" for all problems. Having a good understanding of what you are trying to achieve very often will point you toward the correct data structure and once you have the data structure right very often everything else just slots into place around it.

        Perl is the programming world's equivalent of English
Re: Trying to understand hashes (in general)
by Athanasius (Archbishop) on Dec 23, 2014 at 06:04 UTC

    Hello james28909,

    A hash is an associative array, in which data is stored in key/value pairs. For example, in:

    my %hash = (Fred => 'Wilma', Barney => 'Betty', Homer => 'Marge');

    the keys are Fred, Barney, and Homer, and their corresponding values are Wilma, Betty, and Marge, respectively. Now, in your script, the line:

    $dirs{$file} = $file;

    adds a new key/value pair to the $dirs hash, and in this pair the key and the value are the same (viz., whatever is stored in $file). This is an unnecessary duplication of the data. It would be more normal in this case to set the value to undef (or possibly 1).

    If you do later convert this into a hash of hashes (but you would be better off following GrandFather’s advice and using an array of arrays), then each value will be a reference to an anonymous hash which you create on the fly:

    $dirs{$file} = { ... };

    You should study the tutorial perldsc (“Perl Data Structures Cookbook”).

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

      Thanks for the info, and the link as well.
Re: Trying to understand hashes (in general)
by davido (Cardinal) on Dec 23, 2014 at 07:20 UTC

    Just because a hash stores key/value pairs doesn't mean that you need to put anything useful in the value.

    C++ comes with std::unordered_set and std::unordered_map. The first one is designed for just a set of keys. The second is a set of keys that map to values. The second one is more like Perl's hashes. But the fact that the two different containers exist is mostly a matter of memory efficiency and semantic purity.

    Now back to Perl: Your hashes won't be growing too big, so memory efficiency probably isn't an issue, and we don't need to be too particular about semantic purity. Hashes often may be used where you might think in terms of sets.

    my %heros; @heros{ qw( thetick wonderwoman batman superman spiderman ) } = (); print "Yes!\n" if exists $heros{thetick};

    In the code above we're creating a hash called %heros that contains elements named for various superheros. But we don't explicitly assign a value to each of the elements. Their value is undefined, and it really doesn't matter because we never use it. Later we test to see if 'thetick' is among our set of superheros.


    Dave

Re: Trying to understand hashes (in general)
by FloydATC (Deacon) on Dec 23, 2014 at 08:30 UTC
    I am trying my best to understand exactly how to add just a key without a value to a hash.

    Although not actually very useful in the real world, this can be accomplished by simply assigning undef to a hash key. The key will then exist:

    my %hash = (); $hash{'foo'} = undef; if (exists $hash{'foo'}) { print "The key 'foo' exists.\n"; } else { print "The key 'foo' does not exist.\n"; } if (defined $hash{'foo'}) { print "The key 'foo' is defined.\n"; } else { print "The key 'foo' is undefined.\n"; } if ($hash{'foo'}) { print "The key 'foo' is true.\n"; } else { print "The key 'foo' is false.\n"; }
    What I am going to eventually shoot for, is making a hash of hashes

    To accomplish this, you would make the value of your outer hash (of hashes) a reference to the inner hash, like so:

    my %inner_hash = ( dir => '/tmp/foo/', filename => 'bar.txt' ); my %hash_of_hashes = (); $hash_of_hashes{'baz'} = \%inner_hash; # Backslash = reference to print "The filename associated with 'baz' is " .$hash_of_hashes{'baz' +}->{'filename'} . "\n";

    Or you could use references all the way to begin with: (Notice the curly brackets)

    my $entry = { firstname => 'Ola', lastname => 'Nordmann' }; my $staff = {}; my $id = 123; $staff->{$id} = $entry; printf( "%d: %s, %s\n", $id, $staff->{$id}->{'lastname'}, $staff->{$id}->{'firstname'} );
    -- FloydATC

    Time flies when you don't know what you're doing

      Thanks for the thorough examples.
Re: Trying to understand hashes (in general)
by Rosema1 (Initiate) on Dec 26, 2014 at 01:41 UTC
    The keys and values of your hash are the same because your code sets them that way $dirs{$file}=$fileyou just want your has to have keys and no values you could use $dirs{$file}=''