Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??
Just as Perl is the Swiss Army Chainsaw of languages, the hash is the Leatherman and Krazy Glue of data structures. Whenever you're thinking of the following words, think hash:
  • "look up" -- As in "For each type of pet in the pet store, the system has to look up the kind of food they eat."
  • "unique" -- As in "I want a list of each unique kind of pet we have in this pet store."
  • "check if it's in X" -- As in "I want to check that the kind of pet we enter is in our list of approved pets."
  • "named" -- As in "I want to be able to access the pet's species through something named 'species', its collar size through something named 'collar size', etc."
  • "choose" -- As in "Based on what kind of pet it is, we should choose which subroutine to run."
  • Others I haven't thought of -- As in "It sure would be nice to have an exhaustive list, but I don't."

"look up"

For example, say you've got a type of pet (dog, cat, etc.), and you want to look up what they eat. A hash is perfect for this:

%Pet_Food = ( 'dog' => 'dog chow', 'cat' => 'cat chow', 'parakeet' => 'birdseed' ); # $pet_type should be 'dog', 'cat', or 'parakeet' my $food = $Pet_Food{$pet_type};
If $pet_type is 'dog', then $food winds up being 'dog chow', etc.


Next, say you've got a list of all of the types of pets in the store. However, since it's just an inventory list (somebody went down the cages and typed "dog" for every dog, for some reason), you want to eliminate the thousand-and-one duplicates that are there. Hashes to the rescue. Hash keys are stored only once per hash, so you can get a list of unique names, eliminating the duplicates, like so:

@Pet_Types = ("dog", "cat", "dog", "parakeet", "dog", "cat"); my %type_table = (); foreach my $type ( @Pet_Types ) { $type_table{$type}++; } my @unique_types = sort keys %type_table; # @unique_types winds up with 'cat', 'dog', and 'parakeet'

"check if it's in X"

Say you've got a pet-store application where the user is supposed to enter the type of pet they're looking for. You want to be able to check to make sure they didn't type 'dgo' by accident when they meant 'dog', so we can just check against our list of valid pets:

%Valid_Pets = ( 'dog' => 1, 'cat' => 1, 'parakeet' => 1, 'lemur' => 1 ); # $pet_type should contain 'dog', 'cat', 'lemur' etc... # but might not.. we die if it doesn't exists $Valid_Pets{$pet_type} or die "I'm sorry, '$pet_type' is not a valid pet\n";


Say you have a data file like this:
name=fluffy species=rabbit weight=5 price=10.00 name=fido species=dog weight=15 price=30.15 name=gul_ducat species=cat weight=10 price=40.20
Frequently, folks reading through files like this say, "well, I want to just access this stuff by name-- don't know if we're going to start recording serial numbers or ancestry or other stuff, so I just want to have the 'name' field automatically stored in $name, species in $species, etc.'. (There's a way to do it, but it's a bad thing to do.) Then they post messages on Perlmonks asking us how to automatically call variables by name, and kick off a bunch of debate as a bunch of people say "use hashes", then one or two people tell them black-magic techniques, and it turns into a mess.

Save yourself (and the rest of us :) ) the time and trouble and use hashes whenever you want to call something by name. Read each record in the file into a hash, and access the parameters by saying things like $pet{'species'}, $pet{'price'}, and so on. You'll be that much closer to an object-oriented program, and you won't have that impossible-to-find bug when your parameter named 'x' collides with $x elsewhere in the program. You can do something like this:

while (<PET_FILE>) { my %pet = parse_pet($_); print "Name: $pet{'name'}\n"; } sub parse_pet { my ($line) = @_; my @param_pairs = split(/\s/, $line); my %params = (); foreach my $param ( @param_pairs ) { my ($name, $value) = split(/=/, $param, 2); $params{$name} = $value; } return %params; }
That way, everything about the pet is stored in a single variable, and you don't have a bunch of data running around loose like hamsters escaped from their cages. (Okay, so I'm stuck in the theme.)

Most folks would return a hash reference from the subroutine instead of the entire hash for efficiency reasons. The concept is the same:

while (<PET_FILE>) { my $pet = parse_pet($_); print "Name: $pet->{'name'}\n"; } sub parse_pet { my ($line) = @_; my @param_pairs = split(/\s/, $line); my %params = (); foreach my $param ( @param_pairs ) { my ($name, $value) = split(/=/, $param, 2); $params{$name} = $value; } return \%params; }


Kind of an advanced technique, but if you need to choose between a thousand alternate things to do based on the value of a single string, it's generally best to use a hash (unless you can use object-oriented programming and subclassing, but that's another tale.) Say for example that you want to print a different page based on the species that a customer bought. You could, of course, have a billion-and-one if/elsif statements, like so:

# Note: Bad code! No krispy kreme! if ( $pet_type eq 'dog' ) { print_dog_page(); } elsif ( $pet_type eq 'cat' ) { print_cat_page(); } elsif ( $pet_type eq 'lemur' ) { print_lemur_page(); } # .. ad nauseum
Instead, it's much better to have a hash table of pet types, plus references to subroutines to call in various situations:
%Pet_Pages = ( dog => \&print_dog_page, cat => \&print_cat_page, lemur => \&print_lemur_page, ); my $page_sub = $Pet_Pages{$pet_type} or die "Invalid pet type\n"; &$page_sub();
That way, you don't need to go rappelling down the huge list of if/thens every time you want to add or remove a pet page. It's a powerful technique, although it can be misused. (Don't use it instead of simple if/thens, for example.)

Basically, hashes give incredible flexibility. Combine this with references, and you can have hashes of hashes, and hashes of hashes of hashes (of arrays), until you have data types of whatever structure and complexity you want.

Note: Code not tested.
Update: Fixed typo in code and added hashref example.


In reply to Re: (stephen) Hash Tutorial by stephen
in thread Hash Tutorial by dhammaBum

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?

What's my password?
Create A New User
Domain Nodelet?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (2)
As of 2023-02-05 08:11 GMT
Find Nodes?
    Voting Booth?
    I prefer not to run the latest version of Perl because:

    Results (31 votes). Check out past polls.