Tanalis has asked for the wisdom of the Perl Monks concerning the following question:

Hey all,

I'm writing a fairly nasty piece of code atm which will read in a table from a database (using Sybase::DBlib) and hash it based on user-provided keys (it has to be done this way - I need to read a lot of tables and need the reusability).

I am storing each row's data as it comes from the DB in a temporary hash keyed on column name, and want to append this to the master hash keyed on the specified columns (there could currently be up to 5 keys).

My problem comes when appending the data to the master hash - because the number of keys is unknown, I can think of no "clean" way to append the data. I have experimented a little with recursion, to no avail, and am left currently with long and ugly code which copes with up to 3 keys - which isn't enough. Is there an "easier" way to place this data into the hash?

Any suggestions would be appreciated.

My code atm is as follows:

# download the data from the db, using a trapped alarm in case of db # failure (prevents an infinite wait). alarm 600; local $SIG{ALRM} = sub { return $FALSE; }; $dbh -> dbcmd("SELECT * FROM $table"); $dbh -> dbsqlexec; $dbh -> dbresults; # get the column names for the data my @columnames; for (my $i = 0; $i <= $dbh -> dbnumcols; $i++) { push (@columnames, $dbh -> dbcolname($i)); } # write the table data to a hash my %tableData; while (my @data = $dbh -> dbnextrow) { my %tblData; for (my $i = 0; $i < $#columnames; $i++) { $data[$i] =~ s/^\s|\s$//g; $tblData{$columnames[$i+1]} = $data[$i]; } # get the data keys and hash accordingly my $hashkey = $tblData{$key}; delete $tblData{$key}; if (defined $key3) { # if there are 3 keys my $hashkey2 = $tblData{$key2}; my $hashkey3 = $tblData{$key3}; delete $tblData{$key2}; delete $tblData{$key3}; %{$tableData{$hashkey}{$hashkey2}{$hashkey3}} = %tblDa +ta; } elsif (defined $key2) { # if there are 2 keys my $hashkey2 = $tblData{$key2}; delete $tblData{$key2}; %{$tableData{$hashkey}{$hashkey2}} = %tblData; } else { # there's only 1 key %{$tableData{$hashkey}} = %tblData; } } return \%tableData;

I'm sure there must be a better way to do this, but I really can't see it.

Thanks -- Foxcub.

Replies are listed 'Best First'.
Re: Recursive hash assignment
by davorg (Chancellor) on Sep 05, 2002 at 09:35 UTC

    This seems to do the trick. I've tested it to 4 keys. There's probably a neater way to write it.

    Taking this logic and using it in your app is left as an exercise for the reader :)

    #!/usr/bin/perl -w use strict; use Data::Dumper; my %data; while (<DATA>) { chomp; my ($val, @keys) = split; my $lastkey = pop @keys; my $hashref = \%data; foreach my $k (@keys) { $hashref->{$k} ||= {}; $hashref = $hashref->{$k}; } $hashref->{$lastkey} = $val; } print Dumper \%data; __END__ val1 keyA keyX foo key1 val2 keyA keyY bar key2 val3 keyB keyZ baz key3
    --
    <http://www.dave.org.uk>

    "The first rule of Perl club is you do not talk about Perl club."
    -- Chip Salzenberg

Sample Data Structure
by Tanalis (Curate) on Sep 05, 2002 at 09:20 UTC
    Hopefully to help a little ..

    The data comes off of the database into an array, which is hashed using the column names determined earlier as keys. This gives a (temporary) hash with the following structure (called %tblData in the above):

    $tblData{$columnname} = data from array

    for each column name.

    This then needs to be added to a permenent hash (called %tableData above) keyed on one or more of these column names, the keys being provided by the user. (it's part of a sub that takes a table name and list of keys and returns a hash). This should give the structure:

    %{$tableData{$key1}{$key2}{$key3}} = %tblData

    ie creating a hash of hashes which could be accessed using, for example,
    my $data = $tableData{$key1}{$key2}{$key3}{$columnname}.

    --Foxcub.

      'ID0001' => { 'TR0001' => { 'transaction_id' => '101', 'paymethod' => 'csh', 'payee' => '921', 'pay_date' => '20020902' 'authorised' => 'LDG' }, 'TR0002' => { 'transaction_id' => '102', 'paymethod' => 'csh', 'payee' => '921', 'pay_date' => '20020904' 'authorised' => 'LDG' }, 'TR0003' => { 'transaction_id' => '103', 'paymethod' => 'csh', 'payee' => '921', 'pay_date' => '20020905' 'authorised' => 'DMB' } }, 'ID0002' => { 'TR0001' => { 'transaction_id' => '141', 'paymethod' => 'csh', 'payee' => '253', 'pay_date' => '20020902' 'authorised' => 'LDG' }, 'TR0002' => { 'transaction_id' => '142', 'paymethod' => 'csh', 'payee' => '254', 'pay_date' => '20020903' 'authorised' => 'MDB' }, 'TR0003' => { 'transaction_id' => '162', 'paymethod' => 'chq', 'payee' => '253', 'pay_date' => '20020905' 'authorised' => 'DMB' } }
      .. etc ..

      --Foxcub

Re: Recursive hash assignment
by seeken (Novice) on Sep 06, 2002 at 05:13 UTC
    For this type of thing, I tend to

    $key = join '|', @keys;
    $hash{$key} = $value;


    surfing the net and other cliches....
Partial solution - but it's very slow
by Tanalis (Curate) on Sep 06, 2002 at 11:49 UTC
    All,

    I've managed to write a solution to this problem that works, of sorts, but its *very* slow, especially with large tables (25,000 rows+).

    My code's below - any suggestions on how to improve the speed of this would be very much appreciated.

    my @tmpHash; my $lastKey = $keys[$#keys]; %{$tmpHash[$#keys]{$tblData{$lastKey}}} = %tblData; if ($#keys >= 1) { for (my $n = $#keys - 1; $n >= 0; $n--) { %{$tmpHash[$n]{$tblData{$keys[$n]}}} = %{$tmpHash[$n+1 +]}; } } %tableData = (%tableData, %{$tmpHash[0]});

    --Foxcub

Re: Recursive hash assignment
by Aristotle (Chancellor) on Sep 06, 2002 at 15:07 UTC
    One thing you could do to accellerate your code by probably an order of magnitude right away is to change %{$tableData{$hashkey}{$hashkey2}} = %tblData; to $tableData{$hashkey}{$hashkey2} = \%tblData;

    The syntax in the first sample causes a full copy of the hash to be created from scratch; the other way simply stores a reference to the already built hash, which is much faster. Since a new hash is created by my on each loop iteration, taking a reference to it is okay.

    You can further simplify this by reading perldoc -f delete and finding out that the function returns the value of the deleted element (so long as it's not a tied hash anyway). This means

    my $hashkey2 = $tblData{$key2}; delete $tblData{$key2}; $tableData{$hashkey}{$hashkey2} = \%tblData;
    can be contracted to $tableData{$hashkey}{delete $tblData{$key2}} = \%tblData;

    Now on to the juicy stuff. You can see that adding another level of depth to the hash assignment code always follows the same scheme, right? Well, it's just a matter of teaching Perl how to do that job for us. What we do is use eval to build a closure.

    my @key = grep defined, $key, $key2, $key3; my %tableData; my $store_row = eval join("", q( sub { my $row = shift; $tableData), map("{delete \$row->{'$_'}}", @key), q( = $row; })); while (my @data = $dbh -> dbnextrow) { s/^\s|\s$//g for @data; my %tblData; @tblData{@columnnames} = @data; $store_row(\%tblData); }

    Turns out to be rather short and painless. (Note I haven't tested this; I don't see any mistakes though.) This will work for any depth you want; just add keys to the @key array and off you go. I'm fairly confident that performance will be excellent. It would be best if you can arrange your keys to arrive in a @key array right away rather than in a loose collection of scalars. Note that the closure built takes a hashref as parameter.

    Pay attention to the backslash-escaped sigil in "{delete \$row->{$_}}"; while the $_ is not protected. That means the keynames get interpolated at "compile time", as the closure's code is built.

    Update 2002-09-09: "{delete \$row->{$_}}" will malfunction if the keyname has any whitespace characters. Added single quotes.

    Makeshifts last the longest.

Re: Recursive hash assignment
by blssu (Pilgrim) on Sep 06, 2002 at 20:53 UTC

    I coded up a solution first and then used the other answers to check my code. (This is a fun site. I wish I had more time.) This is very similar to davorg's solution, but a little more finished. The bonus to his solution is he tested it... ;)

    The original used alternation in the regex, which some (all?) versions of perl don't optimize. The original also used column name $i+1 for column value $i. That looked a lot like a bug since it wasn't documented, so I changed it too.

    # query setup omitted. # @keys is an array of your primary keys. The # first element of the array is the key used in # %tableData. Each following key is used to # identify a nested hash. my $last_key = pop @keys; my %tableData; while (my @data = $dbh -> dbnextrow) { my $row = { }; my $i = 0; foreach my $value (@data) { $value =~ s/^\s+//; $value =~ s/\s+$//; $row->{$columnames[$i++]} = $value; } my $nest = \%tableData; my $key_value; foreach my $key_name (@keys) { $key_value = delete $row->{$key_name}; $nest = ($nest->{$key_value}) ? ($nest->{$key_value}) : ($nest->{$key_value} = { }); } $key_value = delete $row->{$last_key}; $nest->{$key_value} = $row; } return \%tableData;
Re: Recursive hash assignment
by Anonymous Monk on Sep 07, 2002 at 00:37 UTC
    Here below is a generic way to get the value from arbitrary nestled keys:
    sub get_hash_key { my ($h, @keys) = @_; my $v = $h; $v = $v->{$_} foreach @keys; return $v; }
    But this isn't so useful if we want to change the data. So instead we want to return a reference to the value:
    sub get_hash_key_ref { my ($h, @keys) = @_; my $r = \$h; $r = \$$r->{$_} foreach @keys; return $r; }
    (Please note that we can't just take &get_hash_key and make the last line return \$v.)

    This technique has several advantages over the other common way (using ||= {}). One is simplicity. In this you do all the logic in one statement and you don't have to make an exception for the last key. Another is that you let perl to the autovivifying for you, so you don't even have to check for bad values. If there is a defined value that isn't a hash references you'll be notified with a (not-so-nice?) error message. This behaviour, I believe, is the most analogous behaviour you can find to doing a hardcoded dereference (overlooking eval EXPR), but I shouldn't say too much--it always turns out I missed something. :)

    Cheers,
    -Anomo