http://qs1969.pair.com?node_id=476966

hmbscully has asked for the wisdom of the Perl Monks concerning the following question:

After nearly five years of working with Perl, I must admit, I still cannot grasp when or how to use a hash. My chapter 5 of the llama is just one big block of highlighting and pencil question marks.

Here's the situation that has made me think its time to figure this out:

I have a registration form collecting, among other things, standard address information from the user. The form data is then dealt with via a script. In the end, the script needs to write the data to one of 7 different csv files based on which state was submitted. I have a list of the states that will comprise each file.

In my head it makes perfect sense that I should look at the state and then if state = blah, then write that line to file = eh. But I can't understand or see what structure I should use to do this. Should I put each list of states into an array? Somehow I think there should be a hash used.

Many TIA. ~WD

  • Comment on Use Hash? Array? Still confused after all these years.

Replies are listed 'Best First'.
Re: Use Hash? Array? Still confused after all these years.
by chromatic (Archbishop) on Jul 21, 2005 at 19:30 UTC

    It depends.

    If you were writing a dictionary, would you put all of the words in a list and all of the definitions in another list, then tell people to look for a word, count its position in the list, and then look for the definition in the other list at the corresponding position?

    Instead, you'd probably invent something a lot like an existing dictionary, where you look up a definition directly by way of its word.

    That's a hash, more or less. Maybe it's a little less efficient to flip through a few pages to try to find the right word and to find the right spot on the page, especially when if you remember that the definition you want is always number #383, but it has a lot of benefit to it as well.

    How does this help you now? If you have a situation where you can arrange your data in terms of looking something up by a known value, you can use a hash. In dictionary terms, the name of the state could be the word and the name of the output file (or the filehandle or whatever) could be the definition. Every time you process a record, you look up the name of the output file in a hash using as a key the name of the state from the record.

    Does that help?

      If a dictionary were like a hash, the words wouldn't be in order! (And with newer dictionaries, the order would be different each time you open the dictionary, unless the environment variable DICTIONARY_SEED is set to 0).
Re: Use Hash? Array? Still confused after all these years.
by radiantmatrix (Parson) on Jul 21, 2005 at 20:02 UTC

    The key to your dilemma lies in another term for hash: "Associative Array". A normal array is a series of items associated with a number indicating their position, like this:

    my @array = ('one','two','three'); ## Results in: |___0___|___1___|___2___| | one | two | three |

    A hash (associative array) is uses strings rather than numerical indexes as keys. In other words, instead of using an index number to locate the data, you associate a string "key" with each data value. This works especially well when you have 1:1 relationships in your data. For example, when I have a list of user IDs which each have a home directory associated with them:

    my %hash = ( 'radiantmatrix' => '/home/2004/radiantmatrix/', 'somedude' => '/home/2009/somedude', ); ## results in |_radiantmatrix____________|_somedude___________| |/home/2004/radiantmatrix/ | /home/2009/somedude|

    So, instead of saying $array[1] ("element of @array with index of '1'"), you can now use the association like $hash{'radiantmatrix'} ("element of %hash associated with the key 'radiantmatrix'")

    You can build more complicated data structures by having, for example, hash keys which point to array references (or even other hash references). For example, if I have usernames and each user has a home directory and a phone number, I could do:

    my %hash = ( 'radiantmatrix' => { home => '/home/2004/radiantmatrix/', phone => +'52123' }, 'somedude' => { home => '/home/2009/somedude', phone => '52124 +' } );

    Now, when I ask for $hash{'somedude'}, the value is another hash (well, a reference to a hash). So, I can say $hash{'somedude'}{'phone'} ("the element with the key of 'phone', inside the element with the key of 'somedude', inside %hash", or "the 'phone' of 'somedude' from %hash").

    So remember, you use a hash (associative array) when you have non-numeric keys (that is, strings) you want to associate with other data.

    As a quick note, nothing prevents you from using numbers as hash keys, either, since Perl ignores the difference. Sometimes this is useful when you have numerical indexes, but there are big gaps in your data. If you have values to associate with the numbers 0,1,2,3,32000,32001 you could store that in a normal array, but perl will allocate memory for 32002 slots. By using a hash, you'd only use memory for values that you have an index for.

    <-radiant.matrix->
    Larry Wall is Yoda: there is no try{} (ok, except in Perl6; way to ruin a joke, Larry! ;P)
    The Code that can be seen is not the true Code
    "In any sufficiently large group of people, most are idiots" - Kaa's Law
Re: Use Hash? Array? Still confused after all these years.
by kirbyk (Friar) on Jul 21, 2005 at 19:33 UTC
    In general, if you want to group things for lookup, you're talking about a hash. If you want things in order, you're talking about an array.

    In this case, as with many real world problems, you sort of want a little of both - you want a hash of arrays.

    At the top level, you want a hash keyed off of state. The value for each of these, you want a list of address information for that state - an array of addresses. Then, your printout loop looks kind of like:

    foreach my $state (keys %addresses) { foreach my $address (@{$addresses{$state}}) { write $filehandle($state) $address . "\n"; } }
    (Obviously leaving out some steps, like creating your filehandles and building up the data structure. But see how simple that organization makes it to deal with?)

    -- Kirby, WhitePages.com

Re: Use Hash? Array? Still confused after all these years.
by brian_d_foy (Abbot) on Jul 21, 2005 at 22:21 UTC

    Use a hash when you want to look something up by a label you give it.

    The name "hash" is unfortunate because it's really just an implementation detail that it's a hash. The previous Perl name was "associative array" because you associated the value and the label you attached to it. Other languages call the same thing "dictionaries".

    For your specific problem, you may want to represent the state information in a way that makes sense to you and that you can easily read and modify. Since the files comprise the state data, the easiest way to do that may be to use the file name as the key (that's the label) and the list of states as the value (in an anonymous array, which you already hinted at in your question, so you're already thinking about this way, which means you get more than you give yourself credit for ;).

    %State_files = ( 'midwest.txt' => [ qw(Illinois ...) ], 'east.txt' => [ ('New York', ... ) ], ... );

    Once you figure that out, you need to make it so you can start from a state and get back to the file. You don't have to do this all with one hash, so make a reverse mapping. Write a little bit of code that turns %State_files into something that use the state name as the key so you can use it to look up the file name. You'll notice that the Llama has a couple of examples of reverse mapping. :)

    %File_states = ( Illinois => 'midwest.txt', 'New York' => 'east.txt', ... );
    --
    brian d foy <brian@stonehenge.com>
Re: Use Hash? Array? Still confused after all these years.
by Eimi Metamorphoumai (Deacon) on Jul 21, 2005 at 19:24 UTC
    Of course, TMTOWTDI, but it seems like a hash would definitely be in order. I'd use the states for keys, and the files for values. So
    my %lookup = ( "Alabama" => "southern.txt", "Alaska" => "northern.txt", .... ); my $file = $lookup{$input_state};
      This is a bit more advanced, but it lets you assign a list of states to a region, and then generates a hash keyed by state name from.
      # Make a hash of regions, pointing to an array of states in each regio +n. # use qw() to quote words. # Use _ for spaces, replace with space when turning hash inside out. my %regions = ( southern => [ qw( Alabama Georgia Tennessee West_Virginia ) ], western => [ qw( Oregon Washington California Idaho Montana )], # ... ); # Turn %regions inside out. my %states; # Winds up with data like $states{Oregon} = west +ern foreach my $region (keys %regions) { foreach (@$regions{$region}) { # Copy the state list s/_/ /g; # replace _ with spaces. West_Virginia becomes We +st Virginia $states{$_} = $region # Store the state info. } }


      TGI says moo

      Thanks for all the replies. Its a lot of good stuff to digest, but luckily, this first entry helped me understand the basic concept and deal with the question at hand. Now to try and get some more work done so I have time to revisit Learning Perl.
Re: Use Hash? Array? Still confused after all these years.
by exussum0 (Vicar) on Jul 21, 2005 at 20:06 UTC
    On a user level, not behind the scenes, a hash and an array are exactly the same thing except one takes arbitrary strings/scalars as their keys and arrays uses only non negative numbers. There are nuances to how they are used in perl, but in a general sense this is true.

    If you can easily map everything to the non-negative numbers integers from 0 through some length/count of the items you have, an array is great. Examples are an already sorted list of names. Want to know who's 5th, just do $names[5]. Arrays hold order very well, since they map to non-negative numbers and are kinda designed as a continuous data structure.

    If you need arbitrary access to information where order or mapping isn't easy or makes sense, a hash is great. For instance, if I put my name, address and other personal info in a single structure, a hash rocks. $person{first_name} would give me a person's first_name.

    To give the counter examples of when they suck, keeping a sorted list of people in a hash would be a little odd. I can do $person{5}, but what would $person{beer} mean in the context? Should it be allowed? It's just, strange.

    Using an array for data that has no easy mapping to say, a function that is natural, an array kinda sucks. For instance, $person[0] being the first name, $person[1] being the last isn't as natural as $person{first_name}.

    The rest is just engineering. They are both fairly fast, it's a matter of seeing which tool fits the job better.

    Update: Wtf.. I used number when I meant integer. I'm an idiot. What i meant and typed were inconsistent. Thanks brian_d_foy.

    ----
    Give me strength for today.. I will not talk it away..
    Just for a moment.. It will burn through the clouds.. and shine down on me.

      Well, hashes aren't exactly the same thing. Arrays are ordered, and hashes aren't. Many people still wonder why keys() doesn't return the keys in the order they added them (or even the same order all the time), and it will always be something I have to emphasize in a beginning Perl class.

      I don't like to say that arrays are indexed by non-negative integers either (and I've always wondered why people use "integer" just to ignore half of them). We can use the negative indices to count from the end as long (as we have enough entries to count).

      --
      brian d foy <brian@stonehenge.com>
        You are right on all counts. In c, you create a buffer overflow (underflow?) unless you are careful w/ your pointer arithmatic. php I believe doesn't mind. Perl, yeah.

        It's fairly universal to say non-negative numbers though, on the concept level. But that's a preference in teaching the concepts, eh? :)

        ----
        Give me strength for today.. I will not talk it away..
        Just for a moment.. It will burn through the clouds.. and shine down on me.

        Wes still index arrays using non-negative integers. Sure, you can use negative integers to count from the end, but $array[-$N] is just syntatical sugar for $array[@array-$N]. If we truely could use negative integers to index, the following would not be an error:
        my @array = (1); $array[-2] = 2;
Re: Use Hash? Array? Still confused after all these years.
by monarch (Priest) on Jul 22, 2005 at 00:55 UTC
    Programming is incidental to the art of data design.

    In any project it is the organisation of your data and the structure in which you keep the data that largely dictates how your program is to be built.

    Think of the data structure as that invisible foundation under the ground of a sky scraper. It takes the most time to construct, and has almost nothing to show for it. But a good data structure (foundation) allows for an impressive program (tower) to be built upon it.

    If your data structure (foundation) is weak and not-well understood by the programmer, then the program is bound to be flawed. Badly flawed.

    It is worth spending a lot of time visualising how to structure your data. Whether it is a hash or an array comes down to how you will want to be accessing your data. But what are the relationships between the data?

    Spend lots of time in the shower thinking about it, or walks in the park, or listening to your favourite music. Then take good long breaks and rest your mind and come back to it later. It may not feel productive, but in the end you'll have a flash of inspiration and understand your data structure, and be set for writing the best program ever..

Re: Use Hash? Array? Still confused after all these years.
by chas (Priest) on Jul 21, 2005 at 19:46 UTC
    The trouble with arrays for your example is that arrays are indexed by non-negative integers, while you really probably want to index by state. If you define arrays @states and @files where $state[$i] and $file[$i] correspond for each $i, you can likely use this for writing your data, but if you want to lookup the filename by state, then you'll have to iterate thru the whole @states array to find the index ... which is, at best, awkward.
    chas
    (Update: I guess someone already said this while I was responding...)
Re: Use Hash? Array? Still confused after all these years.
by tcf03 (Deacon) on Jul 22, 2005 at 12:35 UTC
    After nearly five years of working with Perl, I must admit, I still cannot grasp when or how to use a hash.

    I had the same problem - the concepts just eluded me. What really helped me was roy johnson's node Implementing Dispatch Tables This node in particular was useful, but was not what made it click in my head - but rather the concept of the dispatch table. Im probably not the prson to best explain them to you - I suggest reading Higher Order Perl. HOP started off with just enough information ( which was over my head ) to really get me interested in the problems, which in turn led to a better understanding of the underlying concepts - which is what I really wanted to get at anyway. Anyway - I don't know if this helps, but I certainly connect with the statement above about grasping the concept of hashes. Good luck.

    Ted
    --
    "That which we persist in doing becomes easier, not that the task itself has become easier, but that our ability to perform it has improved."
      --Ralph Waldo Emerson
Re: Use Hash? Array? Still confused after all these years.
by NateTut (Deacon) on Jul 22, 2005 at 14:58 UTC
    Coming from a database background, I think of a hash as a row from a database table. The keys are analagous to the column names while obviously the hash values correspond to row values. When I need to deal with table data I use an array of hashes. Check out Data Structures Tutorial as well as Data Dumper for more information.
      Coming from a database background, *I* think of a hash as a database index. The keys are analagous to the primary key values while obviously the hash values correspond to a row of fields. When I need to deal with table data I use a hash of arrays.
Re: Use Hash? Array? Still confused after all these years.
by techcode (Hermit) on Jul 22, 2005 at 23:13 UTC
    Everyone posted their own views of how and when to use hashes. Problem is that it's really hard to explain in plain words.

    I believe it's hard because it either clicks in your head or not. Well at least it's case with me - once I was introduced to hashes I simply got them as granted.

    As someone pointed out - you use hashes for storing any data that has names. Such as database tables and forms. Sometimes it's also good to use hash as a parameter for some function/method (often referred as PARAMHASH). In cases when you have more than few parameters.

    Anyway if I understood you - I would use following:
    - hash for storing the form data > you get it from CGI that way
    - hash for storing file names. As keys you have state options, and as matching values you have file names.

    So it ends up with something like:

    my $request = CGI->new(); # Is it a hashref or plain hash ?!? my $values = $request->Vars(); # Or maybe coming from a config file my %files = { state1 => 'filename1.csv', state2 => 'filename2.csv', default => 'default_filename.csv', } my $file_name = $files{default}; if(defined $files{$values{state}}) { $file_name = $files{$values{state}; } # Or something shorter like : # my $file_name = $files{$values{state}} || $files{default}; #open....

      You've gone 5 years without learning hashes? Pick up a copy of Learning Perl, and work through it all the way. It will serve you well. Conceptually, you'll want to do something like this:
      # first, tell perl you want a local version of a hash called filename my %filename; # put in the data for your states %filename ( state1 => "FileA.csv", state2 => "FileB.csv", state3 => "FileC.csv"); # The above statement is the same as writing the three commented lines + below. It's prefered because it's simpler, and requires less typing, + so it has lower odds for a mistake. Perhaps the lines below are clea +rer to a beginner, though... they do the same thing as the line above +. #$filename{"state1"}="FileA.csv"; #$filename{"state2"}="FileB.csv"; #$filename{"state3"}="FileC.csv"; # get your state variable somehow... say from a function called get_st +ate() $state = get_state(); $file = $filename{ $state }; # The above statement a just looks for a matching value for $state in +the hash. If it can't find one, uses the undefined value instead. It +assigns whatever value it came up with to $file. It's like writing th +e following big if statement, but again, with less repetition... # if ( $state eq "state1" ) { # $file = "FileA.csv"; # } elsif ( $state eq "state2" ) { # $file = "FileB.csv"; # } elsif ( $state eq "state3" ) { # $file = "FileC.csv"; # } else { # $file = undef(); # }
        I appreciate your response, but in my question I did note that I do have Learning Perl, aka the llama, I've had a copy forever. I know I need to work it through again.