sorting arrays with common index

shabang has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: sorting arrays with common index by roboticus (Chancellor) on Sep 07, 2017 at 17:33 UTC
shebang: You should really invest a little time to get used to hashes: The parallel-array approach is one way you can get the flexibility of a hash, but it's brittle and prone to errors. The hash syntax is just a little trickier than parallel arrays, but once you're used to it, you'll find it simpler overall. To illustrate a little of where I'm coming from, here's a simple program that will print a sorted list of people by age using parallel arrays and the same data in an array of hashes: $ cat pm_1198871_a.pl use strict; use warnings; # Parallel arrays: my @first_name = ('Joe', 'Bob', 'Mary', 'Sue'); my @last_name = ('Smith', 'Jones', 'Blige', 'Parker'); my @age = (25, 43, 19, 57); # An array of hashes: my @people = ( { first=>'Joe', last=>'Smith', age=>25 }, { first=>'Bob', last=>'Jones', age=>43 }, { first=>'Mary', last=>'Blige', age=>19 }, { first=>'Sue', last=>'Parker', age=>57 }, ); # Parallel arrays: Make a sort-by-age list of indices: my @indices = sort { $age[$a] <=> $age[$b] } 0 .. $#first_name; print "Using parallel arrays, sorted by age\n"; for my $i (@indices) { print "$first_name[$i] $last_name[$i] $age[$i]\n"; } # Array of hashes: Make a sort-by-age list of indices: @indices = sort { $people[$a]{age} <=> $people[$b]{age} } 0 .. $#peopl +e; print "\nUsing a hash, sorted by age\n"; for my $i (@indices) { print "$people[$i]{first} $people[$i]{last} $people[$i]{age}\n"; } $ perl pm_1198871_a.pl Using parallel arrays, sorted by age Mary Blige 19 Joe Smith 25 Bob Jones 43 Sue Parker 57 Using a hash, sorted by age Mary Blige 19 Joe Smith 25 Bob Jones 43 Sue Parker 57 [download] In both cases, I just created a sorted list of the array indexes containing the data and printed the report from it. As you can see, the code is very similar in structure. I find that the data in the hash section is a lot easier to read because the related items are right next to each other. The sort statement is a little bit simpler in the parallel array section than the array of hashes, but that's an illusion! There are several reasons that the simplicity of parallel arrays is an illusion. First, we just looked at a very simple case where we wanted to print out the data as a sorted report. But what happens if we really want to sort the data? Let's modify our code to put the data in the actual order we want: `# Parallel arrays: Sort our data by age my @indices = sort { $age[$a] <=> $age[$b] } 0 .. $#first_name; @first_name = @first_name[@indices]; @last_name = @last_name[@indices]; @age = @age[@indices]; print "Using parallel arrays, sorted by age\n"; for my $i (0 .. $#first_name) { print "$first_name[$i] $last_name[$i] $age[$i]\n"; }` [download] In the parallel array version, we still resort to using a list of indices to sort on, then we have to rearrange all the parallel arrays. Immediately the code gets a bit longer. On the other hand, if we're sorting the array of hashes, we don't need to remember a list of indices: we can sort all the data in one step rather than four: `# Array of hashes: Sort our data by age @people = sort { $a->{age} <=> $b->{age} } @people; print "\nUsing a hash, sorted by age\n"; for my $hr (@people) { print "$hr->{first} $hr->{last} $hr->{age}\n"; }` [download] Note that the code became smaller rather than larger. We don't need a list of indices, because we don't have to try to map the changes over multiple data structures. Instead, sort can directly rearrange the array for us. This post is already going long and I'm getting short on time, so I'll be brief on the other reasons that the simplicity is an illusion: Unstated Assumptions Any time you're having to manage multiple data structures in concert, you have to remember to do the appropriate actions everywhere relevant. The parallel array technique relies on a two unstated assumptions: All arrays are the same size. This way, you can use any of the arrays to generate a list of indexes for sorting. All arrays are changed the same way in every location. Any time you change data (add/delete/replace) you must verify that you make the appropriate changes for all arrays. In small programs it's not a problem, but programs have a habit of becoming large. Suppose you wanted to add a person's favorite color to your data. For the hash version, there's no problem--each slot in your array is a bundle of data for a particular person. Adding a favorite color just adds a little information to a specific person. In the parallel array version though, you must add the favorite color array, and then go through your program and find every location where you're changing one of your parallel arrays and ensure that you perform the proper operations on your favorite color array. A mistake anywhere could cause your data to become mismatched and useless. Final Notes When you begin with hashes, things will get a little sticky for a little while. But once you're accustomed to it, many things will suddenly get much easier. When you can just lump a complicated thing in a ball and forget about its internals, it opens up more of your brain to think about the larger problems in your programs. It also helps you make reusable chunks of code. As an example, suppose you added addresses to your collection of "people", so you might have `{ first=>'Morticia', last=>'Addams', street_num=>131313, street_name=>'Mockingbird Lane', city=>'Perish', st=>'NY', zip=>13131, ... }` and include it in your person data. Later when someone asks to add buildings to your program, and you notice that they also have addresses, you could split out your address information into a subhash, and simply call out the address part when you call subroutines that deal with addresses. Then you could use with your buildings and not have to worry about sets of parallel arrays and how to mix arrays containing people and arrays containing buildings information: my $ma = { first=>'Morticia', last=>'Addams', age=>undef, favorite_color=>'bl +ack', address=>{ street_num=>131313, street_name=>'Mockingbird Lane', city=>'Perish', st=>'NY', zip=>13131 }, }; my $white_house = { class=>'GOVT', branch=>'Executive', usage=>'Presidents Residence', + ... address=>{ street_num=>1600, street_name=>'Pennsylvania Avenue, N. +W.', city=>'Washington', st=>'DC', zip=>20500 }, }; print_address($ma->{address}); print_address($white_house->{address}); sub print_address { my $addr = shift; print "$addr->{street_num} $addr->{street_name}\n$addr->{city} $ad +dr->{st} $addr->{zip}\n"; } [download] ...roboticus When your only tool is a hammer, all problems look like your thumb.	[reply] [d/l] [select]
Re: sorting arrays with common index by Anonymous Monk on Sep 07, 2017 at 16:09 UTC
You really should learn about arrays of hashes from perldsc, but here you go: `my @order = sort { $HOST_NAME[$a] cmp $HOST_NAME[$b] } 0 .. $#HOST_NAM +E; foreach my $i (@order) { print "$HOST_IP[$i]:$HOST_NAME[$i]:$HOST_DESCRIPTION[$i]\n"; }` [download]	[reply] [d/l]
Re: sorting arrays with common index by Marshall (Canon) on Sep 08, 2017 at 10:49 UTC
Think about how you would do this in a spreadsheet. You would make columns for the values of HOST_IP, HOST_NAME, and HOST_DESCRIPTION. Then you would select the area of the spreadsheet, then use the sort tool to sort by Column B. Do it the same way in Perl. Make a single structure that is what we call an Array of Arrays, what you may think of as a 2-d Matrix. Then just sort it by "Column B", e.g. 2 or index 1 in this case. Perl numbers arrays starting at zero. Here is some code that may move you in the right direction: #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my @HOST_IP = ("1.2.3", "5.6.7", "8.9.10"); my @HOST_NAME = ("Mary", "Zeke", "Bob"); my @HOST_DESCRIPTION = ("Development", "Test", "Manufacturing"); # create a single array structure and then sort that my @AoA; while (@HOST_IP) { push @AoA, [shift @HOST_IP, shift @HOST_NAME, shift @HOST_DESCRIPTION]; } print "The combined data as an Array of Array:\n"; foreach my $arrayRef (@AoA) { print "@$arrayRef\n"; } @AoA = sort {$a->[1] cmp $b->[1]} @AoA; print "\nSorted by column 2...\n"; foreach my $arrayRef (@AoA) { print "@$arrayRef\n"; } __END__ Prints: The combined data as an Array of Array: 1.2.3 Mary Development 5.6.7 Zeke Test 8.9.10 Bob Manufacturing Sorted by column 2... 8.9.10 Bob Manufacturing 1.2.3 Mary Development 5.6.7 Zeke Test [download]	[reply] [d/l]
Re: sorting arrays with common index by ikegami (Patriarch) on Sep 08, 2017 at 01:28 UTC
`my @sorted_idxs = sort { $HOST_NAME[$a] cmp $HOST_NAME[$b] } 0..$#HOST_NAME; my @sorted_HOST_NAME = @HOST_NAME[ @sorted_idxs ]; my @sorted_HOST_IP = @HOST_IP[ @sorted_idxs ]; my @sorted_HOST_DESC = @HOST_DESC[ @sorted_idxs ];` [download]	[reply] [d/l]

Unstated Assumptions

Final Notes