hsinclai has asked for the wisdom of the Perl Monks concerning the following question:

Wise ones,

The following is academic, and may well never occur in the real world, and I haven't seen this arrangement in the data structure discussions I've read so far around PM.

As in the table below, if you assume we have four arrays, each containing data about different hosts: an array of hostnames, another of IP addresses, another of PTR records, another of uptimes. If you imagine a database table each array would look like a column from the table. (Could we assume, since this is academic, that the data is sorted correctly and corresponds to each host?)

Array 1 Array 2 Array 3 Array 4
hostname1 1.2.3.5 5.3.2.1.in-addr.arpa 131 days
hostname2 11.22.33.55 55.33.22.11.in-addr.arpa 128 days
hostname3 22.21.20.55 55.20.21.22.in-addr.arpa 366 days

If you could arrange the arrays vertically as if they were columns and join them, you could reconstruct the table as it is seen above. From each record (light blue font) in this imaginary table, I'd like to add a hash entry to %ns_records containing the hostname as the key, and an array of the other values as the value of the key.

This code creates a hash using the hostnames as keys, and creating an array of the values from the correct elements of each corresponding array.

#!/usr/bin/perl -w use strict; use Data::Dumper; my @ns_list = ( 'server1.foo-domain.net', 'server2.noo-domain.net', 'server3.zoo-domain.net', ); my @addr_list = ( '1.2.3.5', '11.22.33.55', '22.21.20.55', ); my @ptr_list = ( '5.3.2.1.in-addr.arpa', '55.33.22.11.in-addr.arpa', '55.20.21.22.in-addr.arpa', ); my @uptime_list = ( '131 days', '28 days', '366 days', ); my %ns_records = map { $_ => [ shift @addr_list, shift @ptr_list, shift @uptime_list ] } @ns_list; my $d; foreach my $k ( keys %ns_records ) { print "\n$k:"; for ( $d = 0; $d <= $#{$ns_records{$k}}; $d++ ) { print "\n ", @{$ns_records{$k}}->[$d]; } print $/; } print $/; __END__ print Dumper \%ns_records; $VAR1 = { 'ns0.foo-domain.com' => [ '1.2.3.5', '5.3.2.1.in-addr.arpa', '131 days' ], 'ns0.baz-domain.org' => [ '22.21.20.55', '55.20.21.22.in-addr.arpa', '366 days' ], 'ns0.bar-domain.net' => [ '11.22.33.55', '55.33.22.11.in-addr.arpa', '28 days' ] };
So, is there any golf that can be done on the block containing map - or a more efficient/elegant approach to achieve the same result?

Also, I don't understand why I get the error
"Using an array as a reference is deprecated"
when printing out the result, but the code still runs with Perl 5.8.1. I'd like to be able to dereference the array inside the HoA without error.

Thanks to all !

Replies are listed 'Best First'.
Re: Constructing a HoA from 4 separate arrays
by davido (Cardinal) on Oct 18, 2004 at 03:07 UTC

    First, change @{$ns_records{$k}}->[$d] to $ns_records{$k}->[$d], and your warning goes away. You are trying to use an array as an arrayref, as the warning states. You wouldn't normally say "@array[0]" (that would be a slice). You would say "$array[0]". Dealing with references doesn't change how sigils are applied.

    And here's another (and to me, clearer) way to build the datastructure.

    use strict; use warnings; use Data::Dumper; my @ns_list = ( 'server1.foo-domain.net', 'server2.noo-domain.net', 'server3.zoo-domain.net', ); my @addr_list = ( '1.2.3.5', '11.22.33.55', '22.21.20.55', ); my @ptr_list = ( '5.3.2.1.in-addr.arpa', '55.33.22.11.in-addr.arpa', '55.20.21.22.in-addr.arpa', ); my @uptime_list = ( '131 days', '28 days', '366 days', ); my %hash; $hash{$ns_list[$_]} = [ $addr_list[$_], $ptr_list[$_], $uptime_list[$_] ] for 0 .. $#ns_list; print Dumper \%hash;

    Dave

      That's definitely clearer - thanks Dave!

Re: Constructing a HoA from 4 separate arrays
by Zaxo (Archbishop) on Oct 18, 2004 at 03:21 UTC

    You can do that by working with the indexes, rather than shifting

    my %ns_records = map { $ns_list[$_] => [ $addr_list[$_], $ptr_list[$_], $uptime_list[$_] ] } 0 .. $#ns_list;
    That is non-destructive of your original data.

    A small point of style. The _list suffixes on your arrays don't add much to the readability of your code. I would drop them in favor of descriptive plurals like @addrs, @ptrs, @uptimes. Your hash would read better with a singular name, $ns_record{'foo.org'} pronounced "ns_record of foo.org".

    The warning should have a line number associated with it. I think it is from line 33, @{$ns_records{$k}}->[$d], which should be $ns_records{$k}->[$d].

    After Compline,
    Zaxo

      Your hash would read better with a singular name, $ns_record{'foo.org'} pronounced "ns_record of foo.org".

      By that logic, arrays should be singular as well, because $addr[4] is pronounced "address number 4". Some consistency would be nice.

      For hashes, I use plural when they're used as an array, and singular when they're used as a structure: %ns_records would be a list of records ($ns_records{$domain}), while %ns_record would be a record with named fields ($ns_record{'ptr'}).

      Thanks for the tips - any errors are so glaring after you guys point them out!

      ...that is non-destructive of your original data.
      That is very instructive!

Re: Constructing a HoA from 4 separate arrays
by tmoertel (Chaplain) on Oct 18, 2004 at 04:31 UTC
    Your tabular data is in column-wise format, which isn't very convenient for loading into a hash, row by row. Let's create a helper function to transpose the data into row-wise format:
    sub transpose { my $rows = shift; my $max_col = @{ $rows->[0] } - 1; [ map { my $c=$_; [ map {($_->[$c])} @$rows ] } 0..$max_col ] }
    If we give transpose row-wise data, it will give us back column-wise data and vice versa.
    # [ [ 0, 1 ] <== transpose ==> [ [ 0, 2 ] # , [ 2, 3 ] ] , [ 1, 3 ] ]
    With this helper function, we can easily put your input data into the desired format:
    my $data_by_rows = transpose( [ \@ns_list, \@addr_list, \@ptr_list, \@uptime_list ] ) ;
    Now we have row-wise data:
    $data_by_rows = [ [ 'server1.foo-domain.net', '1.2.3.5', '5.3.2.1.in-addr.arpa', '131 days' ], [ 'server2.noo-domain.net', '11.22.33.55', '55.33.22.11.in-addr.arpa', '28 days' ], [ 'server3.zoo-domain.net', '22.21.20.55', '55.20.21.22.in-addr.arpa', '366 days' ] ];

    All that's left to do is convert each row into hostname=>[data] format and load it into your hash:

    my %ns_records = map {( shift @$_, [ @$_ ] )} @$data_by_rows;
    That's it!

    While munging the data manually isn't difficult, by factoring out the transposition part of the job, we made our solution easier to understand. We also created a handy helper function that we can reuse on other projects.

    Cheers,
    Tom

Re: Constructing a HoA from 4 separate arrays
by ihb (Deacon) on Oct 18, 2004 at 06:48 UTC

    I have a local module, my own version of List::Util, with the utils I find I frequently want for lists. In it &zip and &group are defined. They come in real handy for tasks like this:

    use ihb::List::Util qw/ zip group /; my %ns_records; @ns_records{@ns_list} = group(3, zip(\( @addr_list, @ptr_list, @uptime_list )));

    &zip(ARRAYS) takes a list of array references and zips them together into one list. zip([ 'a' .. 'd' ], [ 1 .. 4 ]) => qw/ a 1 b 2 c 3 d 4 /.

    &group(LENGTH, LIST) groups elements in a list into a list of arrays with length LENGTH.

    Update: If you don't like to parallelly have to keep the array count in sync, you may do

    use ihb::List::Util qw/ zip group /; my @arrays = \( @addr_list, @ptr_list, @uptime_list ); my %ns_records; @ns_records{@ns_list} = group(scalar @arrays, zip(@arrays));
    and this is how I'd prefer to do it, but I didn't want to clutter the first example which I think is clearer to understand at a first glance.

    ihb

    Read argumentation in its context!

Re: Constructing a HoA from 4 separate arrays
by TedPride (Priest) on Oct 18, 2004 at 05:30 UTC
    Ok, there seem to be two basic methods given here for constructing the hash and its contents:
    my %ns_records = map { $_ => [ shift @addr_list, shift @ptr_list, shift @uptime_list ] } @ns_list;
    and (or approximation - this was my version but I see it's already given above):
    for (0..$#ns_list) { $ns_records{$ns_list[$_]} = [$addr_list[$_], $ptr_list[$_], $uptim +e_list[$_]]; }
    Question is, which is better? The second is perhaps more easily understandable, and also leaves the original arrays intact in case you want them later, but I discovered when doing speed tests (1,000,000 iterations) that the first only takes 15 seconds vs 19 seconds for the second. The exact same results are given if you reverse everything and use pop:
    my %ns_records = map { $_ => [ pop @addr_list, pop @ptr_list, pop @uptime_list ] } reverse @ns_list;
    But what if there's larger numbers of items? Using arrays containing 1000 items each and looping 10,000 times, I got 55 seconds for map / shift, 55 seconds for map / pop, and 45 seconds accessing the arrays directly. Apparently the effort of modifying the arrays takes up additional processing time at larger numbers of items.

    Bottom line though, it doesn't really matter from an efficiency standpoint which way you go. Use whichever method is more readable :)

      Could you post your benchmark? Does it deal with the fact that the arrays get destroyed on the first test?


      Dave