Re: (stephen) Hash Tutorial

in reply to Hash Tutorial

Just as Perl is the Swiss Army Chainsaw of languages, the hash is the Leatherman and Krazy Glue of data structures. Whenever you're thinking of the following words, think hash:

"look up" -- As in "For each type of pet in the pet store, the system has to look up the kind of food they eat."
"unique" -- As in "I want a list of each unique kind of pet we have in this pet store."
"check if it's in X" -- As in "I want to check that the kind of pet we enter is in our list of approved pets."
"named" -- As in "I want to be able to access the pet's species through something named 'species', its collar size through something named 'collar size', etc."
"choose" -- As in "Based on what kind of pet it is, we should choose which subroutine to run."
Others I haven't thought of -- As in "It sure would be nice to have an exhaustive list, but I don't."

"look up"

For example, say you've got a type of pet (dog, cat, etc.), and you want to look up what they eat. A hash is perfect for this:

%Pet_Food = (
  'dog' => 'dog chow',
  'cat' => 'cat chow',
  'parakeet' => 'birdseed'
);

# $pet_type should be 'dog', 'cat', or 'parakeet'
my $food = $Pet_Food{$pet_type};
[download]

If $pet_type is 'dog', then $food winds up being 'dog chow', etc.

"unique"

Next, say you've got a list of all of the types of pets in the store. However, since it's just an inventory list (somebody went down the cages and typed "dog" for every dog, for some reason), you want to eliminate the thousand-and-one duplicates that are there. Hashes to the rescue. Hash keys are stored only once per hash, so you can get a list of unique names, eliminating the duplicates, like so:

@Pet_Types = ("dog", "cat", "dog", "parakeet", "dog", "cat");

my %type_table = ();
foreach my $type ( @Pet_Types ) {
  $type_table{$type}++;
}
my @unique_types = sort keys %type_table;
# @unique_types winds up with 'cat', 'dog', and 'parakeet'
[download]

"check if it's in X"

Say you've got a pet-store application where the user is supposed to enter the type of pet they're looking for. You want to be able to check to make sure they didn't type 'dgo' by accident when they meant 'dog', so we can just check against our list of valid pets:

%Valid_Pets = (
   'dog' => 1,
   'cat' => 1,
   'parakeet' => 1,
   'lemur' => 1
);

# $pet_type should contain 'dog', 'cat', 'lemur' etc...
# but might not.. we die if it doesn't
exists $Valid_Pets{$pet_type}
  or die "I'm sorry, '$pet_type' is not a valid pet\n";
[download]

"named"

Say you have a data file like this:

name=fluffy species=rabbit weight=5 price=10.00
name=fido species=dog weight=15 price=30.15
name=gul_ducat species=cat weight=10 price=40.20
[download]

Frequently, folks reading through files like this say, "well, I want to just access this stuff by name-- don't know if we're going to start recording serial numbers or ancestry or other stuff, so I just want to have the 'name' field automatically stored in $name, species in $species, etc.'. (There's a way to do it, but it's a bad thing to do.) Then they post messages on Perlmonks asking us how to automatically call variables by name, and kick off a bunch of debate as a bunch of people say "use hashes", then one or two people tell them black-magic techniques, and it turns into a mess.

Save yourself (and the rest of us :) ) the time and trouble and use hashes whenever you want to call something by name. Read each record in the file into a hash, and access the parameters by saying things like $pet{'species'}, $pet{'price'}, and so on. You'll be that much closer to an object-oriented program, and you won't have that impossible-to-find bug when your parameter named 'x' collides with $x elsewhere in the program. You can do something like this:

while (<PET_FILE>) {
  my %pet = parse_pet($_);
  print "Name: $pet{'name'}\n";
}

sub parse_pet {
  my ($line)  = @_;
  my @param_pairs = split(/\s/, $line);
  my %params = ();
  foreach my $param ( @param_pairs ) {
    my ($name, $value) = split(/=/, $param, 2);
    $params{$name} = $value;
  }
  return %params;
}
[download]

That way, everything about the pet is stored in a single variable, and you don't have a bunch of data running around loose like hamsters escaped from their cages. (Okay, so I'm stuck in the theme.)

Most folks would return a hash reference from the subroutine instead of the entire hash for efficiency reasons. The concept is the same:

while (<PET_FILE>) {
  my $pet = parse_pet($_);
  print "Name: $pet->{'name'}\n";
}

sub parse_pet {
  my ($line)  = @_;
  my @param_pairs = split(/\s/, $line);
  my %params = ();
  foreach my $param ( @param_pairs ) {
    my ($name, $value) = split(/=/, $param, 2);
    $params{$name} = $value;
  }
  return \%params;
}
[download]

"choose"

Kind of an advanced technique, but if you need to choose between a thousand alternate things to do based on the value of a single string, it's generally best to use a hash (unless you can use object-oriented programming and subclassing, but that's another tale.) Say for example that you want to print a different page based on the species that a customer bought. You could, of course, have a billion-and-one if/elsif statements, like so:

# Note: Bad code! No krispy kreme!
if ( $pet_type eq 'dog' ) {
  print_dog_page();
}
elsif ( $pet_type eq 'cat' ) {
  print_cat_page();
}
elsif ( $pet_type eq 'lemur' ) {
  print_lemur_page();
}
# .. ad nauseum
[download]

Instead, it's much better to have a hash table of pet types, plus references to subroutines to call in various situations:

%Pet_Pages = (
  dog => \&print_dog_page,
  cat => \&print_cat_page,
  lemur => \&print_lemur_page,
);

my $page_sub = $Pet_Pages{$pet_type}
   or die "Invalid pet type\n";
&$page_sub();
[download]

That way, you don't need to go rappelling down the huge list of if/thens every time you want to add or remove a pet page. It's a powerful technique, although it can be misused. (Don't use it instead of simple if/thens, for example.)

Basically, hashes give incredible flexibility. Combine this with references, and you can have hashes of hashes, and hashes of hashes of hashes (of arrays), until you have data types of whatever structure and complexity you want.

Note: Code not tested.
Update: Fixed typo in code and added hashref example.

stephen

In Section Seekers of Perl Wisdom