Recursive hash assignment

Tanalis has asked for the wisdom of the Perl Monks concerning the following question:

Hey all,

I'm writing a fairly nasty piece of code atm which will read in a table from a database (using Sybase::DBlib) and hash it based on user-provided keys (it has to be done this way - I need to read a lot of tables and need the reusability).

I am storing each row's data as it comes from the DB in a temporary hash keyed on column name, and want to append this to the master hash keyed on the specified columns (there could currently be up to 5 keys).

My problem comes when appending the data to the master hash - because the number of keys is unknown, I can think of no "clean" way to append the data. I have experimented a little with recursion, to no avail, and am left currently with long and ugly code which copes with up to 3 keys - which isn't enough. Is there an "easier" way to place this data into the hash?

Any suggestions would be appreciated.

My code atm is as follows:

# download the data from the db, using a trapped alarm in case of db
# failure (prevents an infinite wait).
alarm 600;
local $SIG{ALRM} = sub { return $FALSE; };

$dbh -> dbcmd("SELECT * FROM $table");
$dbh -> dbsqlexec;
$dbh -> dbresults;

# get the column names for the data
my @columnames;
for (my $i = 0; $i <= $dbh -> dbnumcols; $i++)
  {
      push (@columnames, $dbh -> dbcolname($i));
    }
                
# write the table data to a hash
my %tableData;
while (my @data = $dbh -> dbnextrow)
    {
        my %tblData;
        for (my $i = 0; $i < $#columnames; $i++)
            {
                $data[$i] =~ s/^\s|\s$//g;
                $tblData{$columnames[$i+1]} = $data[$i];
            }
                                    
                                    
        # get the data keys and hash accordingly
        my $hashkey = $tblData{$key};
        delete $tblData{$key};
                                        
        if (defined $key3)
          { # if there are 3 keys
              my $hashkey2 = $tblData{$key2};
                my $hashkey3 = $tblData{$key3};
                delete $tblData{$key2};
                delete $tblData{$key3};

                %{$tableData{$hashkey}{$hashkey2}{$hashkey3}} = %tblDa
+ta;
            }
        elsif (defined $key2)
          { # if there are 2 keys
              my $hashkey2 = $tblData{$key2};
                delete $tblData{$key2};
                        
                %{$tableData{$hashkey}{$hashkey2}} = %tblData;
      }
        else
          { # there's only 1 key
              %{$tableData{$hashkey}} = %tblData;
            }
     }
                
return \%tableData;
[download]

I'm sure there must be a better way to do this, but I really can't see it.

Thanks -- Foxcub.

Comment on Recursive hash assignment Download Code

Replies are listed 'Best First'.
Re: Recursive hash assignment by davorg (Chancellor) on Sep 05, 2002 at 09:35 UTC
This seems to do the trick. I've tested it to 4 keys. There's probably a neater way to write it. Taking this logic and using it in your app is left as an exercise for the reader :) `#!/usr/bin/perl -w use strict; use Data::Dumper; my %data; while (<DATA>) { chomp; my ($val, @keys) = split; my $lastkey = pop @keys; my $hashref = \%data; foreach my $k (@keys) { $hashref->{$k} \|\|= {}; $hashref = $hashref->{$k}; } $hashref->{$lastkey} = $val; } print Dumper \%data; __END__ val1 keyA keyX foo key1 val2 keyA keyY bar key2 val3 keyB keyZ baz key3` [download] -- <http://www.dave.org.uk> "The first rule of Perl club is you do not talk about Perl club." -- Chip Salzenberg	[reply] [d/l]
Sample Data Structure by Tanalis (Curate) on Sep 05, 2002 at 09:20 UTC
Hopefully to help a little .. The data comes off of the database into an array, which is hashed using the column names determined earlier as keys. This gives a (temporary) hash with the following structure (called `%tblData` in the above): `$tblData{$columnname} = data from array` for each column name. This then needs to be added to a permenent hash (called `%tableData` above) keyed on one or more of these column names, the keys being provided by the user. (it's part of a sub that takes a table name and list of keys and returns a hash). This should give the structure: `%{$tableData{$key1}{$key2}{$key3}} = %tblData` ie creating a hash of hashes which could be accessed using, for example, `my $data = $tableData{$key1}{$key2}{$key3}{$columnname}`. --Foxcub.	[reply] [d/l] [select]
Data for 2 keys .. by Tanalis (Curate) on Sep 05, 2002 at 09:53 UTC
'ID0001' => { 'TR0001' => { 'transaction_id' => '101', 'paymethod' => 'csh', 'payee' => '921', 'pay_date' => '20020902' 'authorised' => 'LDG' }, 'TR0002' => { 'transaction_id' => '102', 'paymethod' => 'csh', 'payee' => '921', 'pay_date' => '20020904' 'authorised' => 'LDG' }, 'TR0003' => { 'transaction_id' => '103', 'paymethod' => 'csh', 'payee' => '921', 'pay_date' => '20020905' 'authorised' => 'DMB' } }, 'ID0002' => { 'TR0001' => { 'transaction_id' => '141', 'paymethod' => 'csh', 'payee' => '253', 'pay_date' => '20020902' 'authorised' => 'LDG' }, 'TR0002' => { 'transaction_id' => '142', 'paymethod' => 'csh', 'payee' => '254', 'pay_date' => '20020903' 'authorised' => 'MDB' }, 'TR0003' => { 'transaction_id' => '162', 'paymethod' => 'chq', 'payee' => '253', 'pay_date' => '20020905' 'authorised' => 'DMB' } } [download] .. etc .. --Foxcub	[reply] [d/l]
Re: Recursive hash assignment by seeken (Novice) on Sep 06, 2002 at 05:13 UTC
For this type of thing, I tend to $key = join '\|', @keys; $hash{$key} = $value; surfing the net and other cliches....	[reply]
Partial solution - but it's very slow by Tanalis (Curate) on Sep 06, 2002 at 11:49 UTC
All, I've managed to write a solution to this problem that works, of sorts, but its very slow, especially with large tables (25,000 rows+). My code's below - any suggestions on how to improve the speed of this would be very much appreciated. `my @tmpHash; my $lastKey = $keys[$#keys]; %{$tmpHash[$#keys]{$tblData{$lastKey}}} = %tblData; if ($#keys >= 1) { for (my $n = $#keys - 1; $n >= 0; $n--) { %{$tmpHash[$n]{$tblData{$keys[$n]}}} = %{$tmpHash[$n+1 +]}; } } %tableData = (%tableData, %{$tmpHash[0]});` [download] --Foxcub	[reply] [d/l]
Re: Recursive hash assignment by Aristotle (Chancellor) on Sep 06, 2002 at 15:07 UTC
One thing you could do to accellerate your code by probably an order of magnitude right away is to change `%{$tableData{$hashkey}{$hashkey2}} = %tblData;` to `$tableData{$hashkey}{$hashkey2} = \%tblData;` The syntax in the first sample causes a full copy of the hash to be created from scratch; the other way simply stores a reference to the already built hash, which is much faster. Since a new hash is created by `my` on each loop iteration, taking a reference to it is okay. You can further simplify this by reading perldoc -f delete and finding out that the function returns the value of the deleted element (so long as it's not a tied hash anyway). This means `my $hashkey2 = $tblData{$key2}; delete $tblData{$key2}; $tableData{$hashkey}{$hashkey2} = \%tblData;` [download] can be contracted to `$tableData{$hashkey}{delete $tblData{$key2}} = \%tblData;` Now on to the juicy stuff. You can see that adding another level of depth to the hash assignment code always follows the same scheme, right? Well, it's just a matter of teaching Perl how to do that job for us. What we do is use eval to build a closure. `my @key = grep defined, $key, $key2, $key3; my %tableData; my $store_row = eval join("", q( sub { my $row = shift; $tableData), map("{delete \$row->{'$_'}}", @key), q( = $row; })); while (my @data = $dbh -> dbnextrow) { s/^\s\|\s$//g for @data; my %tblData; @tblData{@columnnames} = @data; $store_row(\%tblData); }` [download] Turns out to be rather short and painless. (Note I haven't tested this; I don't see any mistakes though.) This will work for any depth you want; just add keys to the `@key` array and off you go. I'm fairly confident that performance will be excellent. It would be best if you can arrange your keys to arrive in a `@key` array right away rather than in a loose collection of scalars. Note that the closure built takes a hashref as parameter. Pay attention to the backslash-escaped sigil in `"{delete \$row->{$_}}"`; while the `$_` is not protected. That means the keynames get interpolated at "compile time", as the closure's code is built. Update 2002-09-09: `"{delete \$row->{$_}}"` will malfunction if the keyname has any whitespace characters. Added single quotes. Makeshifts last the longest.	[reply] [d/l] [select]
Re: Recursive hash assignment by blssu (Pilgrim) on Sep 06, 2002 at 20:53 UTC
I coded up a solution first and then used the other answers to check my code. (This is a fun site. I wish I had more time.) This is very similar to davorg's solution, but a little more finished. The bonus to his solution is he tested it... ;) The original used alternation in the regex, which some (all?) versions of perl don't optimize. The original also used column name $i+1 for column value $i. That looked a lot like a bug since it wasn't documented, so I changed it too. # query setup omitted. # @keys is an array of your primary keys. The # first element of the array is the key used in # %tableData. Each following key is used to # identify a nested hash. my $last_key = pop @keys; my %tableData; while (my @data = $dbh -> dbnextrow) { my $row = { }; my $i = 0; foreach my $value (@data) { $value =~ s/^\s+//; $value =~ s/\s+$//; $row->{$columnames[$i++]} = $value; } my $nest = \%tableData; my $key_value; foreach my $key_name (@keys) { $key_value = delete $row->{$key_name}; $nest = ($nest->{$key_value}) ? ($nest->{$key_value}) : ($nest->{$key_value} = { }); } $key_value = delete $row->{$last_key}; $nest->{$key_value} = $row; } return \%tableData; [download]	[reply] [d/l]
Re: Recursive hash assignment by Anonymous Monk on Sep 07, 2002 at 00:37 UTC
Here below is a generic way to get the value from arbitrary nestled keys: `sub get_hash_key { my ($h, @keys) = @_; my $v = $h; $v = $v->{$_} foreach @keys; return $v; }` [download] But this isn't so useful if we want to change the data. So instead we want to return a reference to the value: `sub get_hash_key_ref { my ($h, @keys) = @_; my $r = \$h; $r = \$$r->{$_} foreach @keys; return $r; }` [download] (Please note that we can't just take `&get_hash_key` and make the last line `return \$v`.) This technique has several advantages over the other common way (using `\|\|= {}`). One is simplicity. In this you do all the logic in one statement and you don't have to make an exception for the last key. Another is that you let perl to the autovivifying for you, so you don't even have to check for bad values. If there is a defined value that isn't a hash references you'll be notified with a (not-so-nice?) error message. This behaviour, I believe, is the most analogous behaviour you can find to doing a hardcoded dereference (overlooking `eval EXPR`), but I shouldn't say too much--it always turns out I missed something. `:)` Cheers, -Anomo	[reply] [d/l] [select]