in reply to sorting arrays with common index
You should really invest a little time to get used to hashes: The parallel-array approach is one way you can get the flexibility of a hash, but it's brittle and prone to errors. The hash syntax is just a little trickier than parallel arrays, but once you're used to it, you'll find it simpler overall. To illustrate a little of where I'm coming from, here's a simple program that will print a sorted list of people by age using parallel arrays *and* the same data in an array of hashes:
$ cat pm_1198871_a.pl use strict; use warnings; # Parallel arrays: my @first_name = ('Joe', 'Bob', 'Mary', 'Sue'); my @last_name = ('Smith', 'Jones', 'Blige', 'Parker'); my @age = (25, 43, 19, 57); # An array of hashes: my @people = ( { first=>'Joe', last=>'Smith', age=>25 }, { first=>'Bob', last=>'Jones', age=>43 }, { first=>'Mary', last=>'Blige', age=>19 }, { first=>'Sue', last=>'Parker', age=>57 }, ); # Parallel arrays: Make a sort-by-age list of indices: my @indices = sort { $age[$a] <=> $age[$b] } 0 .. $#first_name; print "Using parallel arrays, sorted by age\n"; for my $i (@indices) { print "$first_name[$i] $last_name[$i] $age[$i]\n"; } # Array of hashes: Make a sort-by-age list of indices: @indices = sort { $people[$a]{age} <=> $people[$b]{age} } 0 .. $#peopl +e; print "\nUsing a hash, sorted by age\n"; for my $i (@indices) { print "$people[$i]{first} $people[$i]{last} $people[$i]{age}\n"; } $ perl pm_1198871_a.pl Using parallel arrays, sorted by age Mary Blige 19 Joe Smith 25 Bob Jones 43 Sue Parker 57 Using a hash, sorted by age Mary Blige 19 Joe Smith 25 Bob Jones 43 Sue Parker 57
In both cases, I just created a sorted list of the array indexes containing the data and printed the report from it. As you can see, the code is very similar in structure. I find that the data in the hash section is a lot easier to read because the related items are right next to each other. The sort statement is a little bit simpler in the parallel array section than the array of hashes, but that's an illusion!
There are several reasons that the simplicity of parallel arrays is an illusion. First, we just looked at a very simple case where we wanted to print out the data as a sorted report. But what happens if we really want to sort the data? Let's modify our code to put the data in the actual order we want:
# Parallel arrays: Sort our data by age my @indices = sort { $age[$a] <=> $age[$b] } 0 .. $#first_name; @first_name = @first_name[@indices]; @last_name = @last_name[@indices]; @age = @age[@indices]; print "Using parallel arrays, sorted by age\n"; for my $i (0 .. $#first_name) { print "$first_name[$i] $last_name[$i] $age[$i]\n"; }
In the parallel array version, we still resort to using a list of indices to sort on, then we have to rearrange all the parallel arrays. Immediately the code gets a bit longer. On the other hand, if we're sorting the array of hashes, we don't need to remember a list of indices: we can sort all the data in one step rather than four:
# Array of hashes: Sort our data by age @people = sort { $a->{age} <=> $b->{age} } @people; print "\nUsing a hash, sorted by age\n"; for my $hr (@people) { print "$hr->{first} $hr->{last} $hr->{age}\n"; }
Note that the code became smaller rather than larger. We don't need a list of indices, because we don't have to try to map the changes over multiple data structures. Instead, sort can directly rearrange the array for us.
This post is already going long and I'm getting short on time, so I'll be brief on the other reasons that the simplicity is an illusion:
Any time you're having to manage multiple data structures in concert, you have to remember to do the appropriate actions *everywhere* relevant. The parallel array technique relies on a two unstated assumptions:
When you begin with hashes, things will get a little sticky for a little while. But once you're accustomed to it, many things will suddenly get much easier. When you can just lump a complicated thing in a ball and forget about its internals, it opens up more of your brain to think about the larger problems in your programs. It also helps you make reusable chunks of code.
As an example, suppose you added addresses to your collection of "people", so you might have { first=>'Morticia', last=>'Addams', street_num=>131313, street_name=>'Mockingbird Lane', city=>'Perish', st=>'NY', zip=>13131, ... } and include it in your person data. Later when someone asks to add buildings to your program, and you notice that they also have addresses, you could split out your address information into a subhash, and simply call out the address part when you call subroutines that deal with addresses. Then you could use with your buildings and not have to worry about sets of parallel arrays and how to mix arrays containing people and arrays containing buildings information:my $ma = { first=>'Morticia', last=>'Addams', age=>undef, favorite_color=>'bl +ack', address=>{ street_num=>131313, street_name=>'Mockingbird Lane', city=>'Perish', st=>'NY', zip=>13131 }, }; my $white_house = { class=>'GOVT', branch=>'Executive', usage=>'Presidents Residence', + ... address=>{ street_num=>1600, street_name=>'Pennsylvania Avenue, N. +W.', city=>'Washington', st=>'DC', zip=>20500 }, }; print_address($ma->{address}); print_address($white_house->{address}); sub print_address { my $addr = shift; print "$addr->{street_num} $addr->{street_name}\n$addr->{city} $ad +dr->{st} $addr->{zip}\n"; }
...roboticus
When your only tool is a hammer, all problems look like your thumb.
|
|---|