ghenry has asked for the wisdom of the Perl Monks concerning the following question:

How do you know which one to use and when?

Thanks.

Walking the road to enlightenment... I found a penguin and a camel on the way..... Fancy a yourname@perl.me.uk? Just ask!!!
  • Comment on What defines the choice of a Hash or an Array?

Replies are listed 'Best First'.
Re: What defines the choice of a Hash or an Array?
by mattr (Curate) on Mar 31, 2005 at 08:42 UTC
    Another way to think of hashes and arrays are as physical metaphors.

    An array is like a stack of blocks. You can add on or pull off a block from the top or bottom of the stack, and you can pull out say the 15th block in the stack. An empty array is like a spot on a table where you can pile blocks. Imagine that each block is transparent with something (a number, a letter, a word, a hyperlink to some other structure..) inside. But the blocks are opaque on the sides, you have to pull one out and open the cover to see what's in a block.

    So you can easily pick out a block if you know its height from the table (the bottom block is 0 blocks away, so it is index # 0). Arrays are also called lists, so they are good for storing things ("values") if you have a whole list of them. So an array is a sequential list of values that you can access according to their position in the sequence.

    Hashes are different from arrays. If you think about an array as a stack of blocks with opaque sides, you realize that even if you have a 100 block tall stack you can't see what's inside at a glance from the side. You have to count to the 15th block and pull it out. (Well perl will do the counting but you have to specify a block number, or else pop off the top 85 blocks.) So an array, like a stack of opaque blocks, is really a sequential thing that you access one block at a time usually.

    A hash isn't like that. It's more like having everything transparent. We are talking random access, not sequential. In fact, there is no stack. Let’s make a hash called %trees. Imagine you are sitting on the floor with your blocks scattered all over the place in front of you. Each block has a unique name or number (usually we like to use English words) printed on it in big easy to read letters, each block associated with its own label. So there is a block called "maple" and inside that block maybe it has the number "50" to describe the height of the tree. Another block is called "sequoia" and inside that block is the number "200" for the height of a sequoia tree.

    Say you have a hundred blocks scattered in front of you, all associating the name of a tree with its height. How long does it take to find how tall a dogwood tree is? About no time at all. You can see the keyword "Dogwood" (we call it the "key" of the block) standing out right in front of you, and immediately pick up the Dogwood block and retrieve the value "25" from inside it. Ahah, a Dogwood tree is 25 feet high. So it is really easy to keep a huge amount of data in an easily retrievable way, just assign an easily remembered keyword to it and toss it on the floor (into the hash). You can do this for ten or ten thousand items, or more, it doesn’t matter, you can always pick up the right block if you know its key right away.

    If you were using an array (a "stack of blocks") it would be much harder to do this. First of all, a block in an array doesn’t have a keyword, to specify one in an array you have to specify how high it is off the ground (its index, or number in the sequence as you count up from the ground starting at zero). Say you want to store information about a Pine tree being 75 feet high in an array. One easy perlish way of doing it would be to jam the name of the tree and its height together into a single string of letters, separated by a symbol (say we use a colon character as the delimiter). So the string for this tree is "pine:75" which is the value we will stuff into a block we push onto the stack of blocks, our array. We say push(@trees, join(":","pine","75")) to do so. But if we want to look up the height of a given tree in a tall array, we have to go looking inside each block one at a time and see if the value inside each block matches the name of the tree we want (using a regular expression to match the head of the string maybe). All of this trouble is eliminated by using a hash.

    An array of course is better than a hash if you have two apple trees of different heights. Because each block in a hash has to have a unique keyword attached to it. If you change the name slightly (say "apple1" and "apple 2") a hash is still useable, but for a list of heights of say twenty different apple trees, an array is better. In this case we know they are apple trees and we can just put their height into the block for that number apple tree.

    I hope this helps you a little bit in visualizing the difference between an array (a stack of opaque blocks) and a hash (blocks scattered in front of you on the floor). I use hashes often to keep track of a bunch of things at once, and you might like to try the exists and defined functions for hash elements too. You can easily see if a pear tree exists in the hash %trees by saying: if (exists $trees{pear}) { print "Yes\n";} If you read more of the perl manuals on data structures (see the PM library) you will discover that you can stuff arrays into hashes and vice versa, making all kinds of complex structures. So the best answer to our tree survey would be to make a hash of lists (a HoL, where you stuff an array into each hash element), so that for the hash key "apple" we can have an ordered list of heights for all apple trees, like $trees{apple}[0] = 20, $trees{apple}[1] = 22, etc.

    Finally you might like to consider the reason why these structures are called hashes and arrays in the first place. An array really can be more than just a one-dimensional stack of blocks. It can have many dimensions, so a three dimensional array would be like a solid cube full of blocks. You would still have to select a block by its index along each dimension (think its x,y,z coordinates) like $block[2][5][15]. So an array has elements neatly arrayed by number.

    A hash is a term that talks about a kind of mathematical function that distributes data well, and you could imagine that in order to instantly access any block out of say a hundred thousand scattered all over the floor, you are going to need a really good way of distributing that information. In fact a hash in perl stores data in something like buckets hanging from tree limbs and the speed with which you can get data out of a bucket has to do with making sure that you don't have too many branches, or too much data in a bucket. In fact the keyword you pick is mathematically converted (or "encrypted" or "hashed") so well that you can pick a million different keywords and be very sure they will not collide with each other after being hashed.

    Have fun! how

      Great reply. Thanks.

      Walking the road to enlightenment... I found a penguin and a camel on the way..... Fancy a yourname@perl.me.uk? Just ask!!!
Re: What defines the choice of a Hash or an Array?
by davido (Cardinal) on Mar 31, 2005 at 07:31 UTC

    An array is just a list of things indexed by sequential numbers. A hash is an "associative array"; an unordered list of things indexed by keys that are each associated with one of the things. Just as an array's indices are 'unique' (you can't have two elements names 33, for example), a hash's keys are also unique (you can't have two hash keys both named 'pete'). Hash elements aren't inherently ordered. That means that there isn't any concept of 'pete comes before dave'.

    So use an array anytime you need an ordered list, or a list of things indexed numerically starting at zero. Use an array if you need to push, pop, shift, or unshift.

    Use a hash anytime you need to organize a bunch of things indexed by keys instead of '0 .. n' subscripts. Use a hash where order is unimportant (or where you don't mind sorting the keys, or maintaining a sorted array of keys). Use a hash if you need to maintain uniqueness of keys. Use a hash for sparse arrays, if it makes sense to do so. Use a hash if you need to quickly verify existance of an item based on its key (exists).

    There are an infinite number of uses for each.


    Dave

      So, you could decide by how you would like to manipulate it?

      Also, if you don't want to go to the extra effort of sorting a hash, you can just stick with an Array?

      Walking the road to enlightenment... I found a penguin and a camel on the way..... Fancy a yourname@perl.me.uk? Just ask!!!
Re: What defines the choice of a Hash or an Array?
by castaway (Parson) on Mar 31, 2005 at 07:37 UTC
    Hash:
    • Sparse contents - i.e. indices of 1, 100, 3000 etc, an array would necessarily also create all the indices between, a hash doesnt.
    • Indices that arent just numbers, any string can be used as a hash index.
    • Uniqueness of items. Yes, array indices are also unique, a hash can be used, for example, to filter unique items from a list of strings.
    Array:
    • Ordered lists: Hashes arent guaranteed to return their contents in any particular order, arrays are used in ascending order of indices.
    • Uses less space than a hash (I think)

    No doubt there are some that I'm missing.

    C.

      • Uses less space than a hash (I think)

      This was the kind of thing I was thinking about.

      Walking the road to enlightenment... I found a penguin and a camel on the way..... Fancy a yourname@perl.me.uk? Just ask!!!
        That is correct, but its not as obvoious as you may think.

        Among other things, realize that the space may be continous for array bodies or may not.
        Also, as the hash keys are stored in a separate table, it depends on how long is the key, and of course, the length of the value for both hashes and arrays.

        Some examples:

        use Devel::Size qw(size); my @array; my %hash; my $array_size = size(\@array); my $hash_size = size(\%hash); print "Array: $array_size, Hash: $hash_size\n";
        The hash allocates more memory in this case, and allocates more than the array in a lineal comparison for every filled key-value pair in the hash, and any contiguous index-value in the array.

        On the other hand, you can try this:
        use Devel::Size qw(size); use Data::Dumper; my @array; my %hash; $hash{'f'} = 140; $array[141] = 142; print Dumper @array; my $array_size = size(\@array); my $hash_size = size(\%hash); print "Array: $array_size, Hash: $hash_size\n";

        Array indexes 0..141 are undefined, but some memory is reserved. In this case, for a single value a hash takes few memory.

        How come I lose reputation for a reply?

        Walking the road to enlightenment... I found a penguin and a camel on the way..... Fancy a yourname@perl.me.uk? Just ask!!!
        Uses less space than a hash (I think)

        Unless you're using e.g. phone-numbers as the index: see the sparse array point above. "Uses less space per value than an array" would be better but kind of obvious - Perl has to store the key as well.

Re: What defines the choice of a Hash or an Array?
by thinker (Parson) on Mar 31, 2005 at 08:32 UTC

    Hi ghenry,

    When I am thinking of my code, there are two words which I look out for which suggest I might want ot use a hash. Those words are "of" and "contains" (or its more perlish alternative, exists).

    An example of "of" might be

    my %capitals = (France => "Paris", Scotland => "Edinburgh", ); # find the capital "of" Scotland print $capitals{"Scotland"}, "\n";

    The fast access speed of hashes makes it a good candidate to find the intersection of two lists. For (a rather contrived) example

    # we have a pre-prepared list of prime numbers my @primeslist = qw(2 3 5 7 11 13 17 19 23 29); # populate a hash with the primes as keys # this allows for fast retrieval my %primes; @primes{@primeslist} = (); # now we can loop through our new list ( 1 .. 30) # using the exists() function to (quickly) check if a key exists print "$_ is prime\n" for grep exists $primes{$_}, 1 .. 30;

    Hope this helps

    thinker

      Thanks thinker.

      Walking the road to enlightenment... I found a penguin and a camel on the way..... Fancy a yourname@perl.me.uk? Just ask!!!
Re: What defines the choice of a Hash or an Array?
by inman (Curate) on Mar 31, 2005 at 07:56 UTC
    Not so much an answer to the question but a comment on your sig. Since you appear to be talking to animals on your journey, you may wish to ask the camel to introduce you to the llama. The llama himself is a good teacher (even if his owner can be grouchy at times!) and will start you on the path to Perl enlightenment.

      Got that one.

      I was just after a maybe, in the field answer.

      Walking the road to enlightenment... I found a penguin and a camel on the way..... Fancy a yourname@perl.me.uk? Just ask!!!
Re: What defines the choice of a Hash or an Array?
by jhourcle (Prior) on Mar 31, 2005 at 12:33 UTC

    Let's look at this as a sort of a flow chart, to decide what our main data type is. I'm not going to go into when we'd use an array of arrays vs. an array of hashes, vs. a tied hash, etc.

    • Do we care about the order in which the items were put in? (if yes : array).
    • Are the values indexed by something other than positive integers or zero? (if yes : hash)
    • Are the indexes sparely populated, if we look from the minimum to the maximum index? (if yes : hash)
    • Are the indexes sparely populated, if we look from 0 to the maximum index?* (if yes : hash)
    • All others : (use an array)

    *Note : if you can shift your indexes, you might be able to get away with this, but you won't access the indexes directly, which might confuse some people. Eg, if we're tracking years 1920-2000, we might index on (year - 1900). We can also handle floats this way, if we have a known precision, like with yesterdays's discussion of indexing on earthquake magnitude in Problems with defining hashes, or we could handle negative integers, etc.

Re: What defines the choice of a Hash or an Array?
by PodMaster (Abbot) on Mar 31, 2005 at 07:33 UTC
    How do you know which one to use and when?
    If you know the difference between an array and a hash, you'll know when to use which.

    `perldoc perldata'

    MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!"
    I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README).
    ** The third rule of perl club is a statement of fact: pod is sexy.

      I do know the difference, but you could use an array, as it's ordered, and, for the same thing, you could just use a hash and then sort it.

      I wanted to hear from people on their preference and I thought that there might be a performance issue etc.

      But thanks for the links.

      Walking the road to enlightenment... I found a penguin and a camel on the way..... Fancy a yourname@perl.me.uk? Just ask!!!
        I do know the difference, but you could use an array, as it's ordered, and, for the same thing, you could just use a hash and then sort it.
        Say you live a city next to an airport, and you want to visit a friend who lives 20 miles away at the other end of city also next to an airport, do you fly or do you drive?

        Use a hash when it benefits you to do so, not just because you can.

        MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!"
        I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README).
        ** The third rule of perl club is a statement of fact: pod is sexy.

Re: What defines the choice of a Hash or an Array?
by tcf03 (Deacon) on Mar 31, 2005 at 07:41 UTC
    from perldata:
    Perl has three data structures: scalars, arrays of scalars, and associ +ative arrays of scalars, known as ``hashes''. Normal arrays are index +ed by number, starting with 0. (Negative subscripts count from the en +d.) Hash arrays are indexed by string.
    but actually reading perldata would be your best bet.
Re: What defines the choice of a Hash or an Array?
by gam3 (Curate) on Mar 31, 2005 at 16:24 UTC
    I think that the name of the perl data types: ARRAY and HASH are confusing this issue. It is important not to confuse these data types with the data structures with simular names. Both perl data types can be used to create arrays:
    $ARRAY->[1][2] = 'bob';
    $HASH->{1}{2} = 'bob';
    

    As this table shows about the only thing you can use a HASH data type for directly is as an array.

    data typeARRAYHASH
    arrayXX
    listX
    iteratorX
    stackX
    queueX

    So should you use and HASH or an ARRAY to store arrays? Accessing the ARRAY data type is only slightly faster than accessing the HASH data type, so I would recomend using HASH arrays except for rare cases were speed is vital. If you your arrays are more more 1 dimentional then the ARRAY data type really looses all of its advantage over the HASH data type, as it can no longer be used directly as a list.

             Rate  HASH ARRAY
    HASH  54856/s    --   -5%
    ARRAY 57536/s    5%    --
    
    And of course every thing else is in an ARRAY (or a SCALAR).
    -- gam3
    A picture is worth a thousand words, but takes 200K.

      I'm not sure why you bring up the issue of storing array references in hashes versus arrays. I'm also not sure why speed is a consideration. Arrays are useful where you need ordered access, where you might use a data structure such as a stack, a queue, a dequeue, or access to a list of items by sequential indexes. Hashes are useful when you need keyed access to a list of items where near-constant access time is more important than preserving insertion order.

      That dubiousity aside, I think the rest of your post is confusing misinformation.

      I think that the name of the perl data types: ARRAY and HASH are confusing this issue. It is important not to confuse these data types with the data structures with simular names.

      No, this is wrong.

      The Perl data type has the name ARRAY because it is an array. The Perl data type has the name HASH because it is a hash.

      Both perl data types can be used to create arrays:
      $ARRAY->[1][2] = 'bob'; $HASH->{1}{2} = 'bob';

      This is half wrong. The first is an array because it's an array. The second is not an array because it's a hash. Try iterating through the indexes of the hash, for example. Try using a non-integer as the index of an array. Try using a huge integer as the index of an array.

      Sure, if you provide a really degenerate, simplified definition of what an array is, you can emulate it with a hash, but that doesn't make it an array.

      If you your arrays are more more 1 dimentional then the ARRAY data type really looses all of its advantage over the HASH data type, as it can no longer be used directly as a list.

      I don't know what this means, but unless you're talking about the need to dereference nested arrays (which you also have to do with nested hashes), I think it's wrong too.

        No ... wait ... silly post was posted on Mar 31, but rational sane response was posted on Apr 1 - April Fool's Day. Where's the laughter if there's no joke? This is a paradox wrapped in an enigma wrapped in a conundrum! Ahhh, my ... brains!!! AAAAAAAAAAAAAAAAAHHHHHHHHHHHHHH!!!