scottb has asked for the wisdom of the Perl Monks concerning the following question:

All, I'm dealing with data in an "infinate dimension hash", that is, a hash where the number of keys to fetch a value is unknown and is not of an arbitary size. For example I could have:
$DATA{1}{a}{e} = 'snu'; $DATA{2}{a}{f} = 'woo'; $DATA{3}{b} = 'foo'; $DATA{4}{a}{e} = 'bar'; $DATA{5}{d} = 'bop';
I'm writing some functions to work with the data and reading the hash and just particular branches of keys has not been a problem. But I'd like to be able to arbitarily create a branch in this data structure based on a reference to a point in it. So for example, I'd like to be able to do:
put_element($DATA{3}, 'h', 'i', 'worked');
Which would in effect create the following:
$DATA{3}{h}{i} = 'worked';
This function would be useful in culling and replacing brances of the data structure, and so on. But, alas, I am having issues. After I call the function that I have written, and then walk the subsiquent hash, it only contains the original seeded values, and not the one I added with my function. I can't figure out what is wrong, or more importantly, how to make it do what I would like. The function, as I have written it, is as follows:
sub put_element { my $current = shift || die("Need root hash reference for put_element +!"); # reference to the current level my $value = pop || die("Need value to be put for put_element!"); + # value to be placed in structure $current = $$current{shift(@_)} while (@_ > 0); ${$current} = $value; return 1; }
Any input into where my errors may be is appreciated.

Thanks,
Scott

Replies are listed 'Best First'.
Re: Missing something with regards to references
by blokhead (Monsignor) on Nov 01, 2004 at 22:35 UTC
    Your code is pretty close, but I'm afraid you need another level of referencing to deal with autovivifying. I'll try to explain why... Consider when you have an empty $DATA{3}{h} and call put_element($DATA{3}, 'h', 'i', "worked"). Then your code eventually does:
    • $current = $DATA{3}{h}; (this returns undef)
    • $current = $$current{i};
    That last line autovififies the undef in $current to a fresh anonymous hashref, fetches the key 'i' from the new anonymous hash (which returns undef), and sets $current to this undef... But overwriting the value of $current cleans up the anonymous hash that was just created. It's lost forever. Whoops!

    Instead of this, you need to keep another level of references. Now instead of trying to keep the values of $DATA{3}, $DATA{3}{h}, $DATA{3}{h}{i} (which are copied, so don't do the right thing if the hash key doesn't exist), we have to keep references to those slots of the hashes. This way the new hashes can be autovivified in the appropriate slot, and not into your temporary variable.

    You also need to use $_[0] instead of shift if the first argument has a hash dereference that may be autovifified (since shift makes a copy, but $_[0] is an alias to the first arg, not a copy).

    sub put_element { my $current = \ $_[0]; shift; my $value = pop; $current = \ $$current->{ +shift } while @_; $$current = $value; } use Data::Dumper; my %DATA; put_element( $DATA{3}, 'h', 'i', "Worked" ); ## $DATA{3} autovifified + too put_element( $DATA{3}, 'b', 'y', 'e' ); ## $DATA{3} exists print Dumper \%DATA;
    Yes, the autovivification makes this a very sticky situation. But if you can understand this, you are in very good shape with your understanding of references.

    blokhead

      Thank you for the explination.

      It sort of make sense to me, but this is entirely new territory for me. I'd had a hunch at one time that the undef keys beneath the one I wanted would be a problem, but I couldn't quite wrap myself around what could be done about it. Using the reference rather than copies of values sort of makes sense too, but I think it needs time to sink in. I think the part about using $_[0] instead of shift makes the least sense to me... since I am always pathing through a reference, I apparently need to make a reference to the reference?

      Anyway, thank you. This is very helpful.

        since I am always pathing through a reference, I apparently need to make a reference to the reference?

        \$_[0] is not needed if the given (first) key is already a hashref. E.g.:

        my %DATA = (3 => {}); put_element($DATA{3}, ...);

        But it is needed if the key doesn't exist yet:

        my %DATA = (3 => {}); put_element($DATA{42}, ...);

        "But why doesn't my $current = \shift work? Why do I have to use my $current = \$_[0]?" Answer: If the key already exists and its value is a hashref, both possibilities work. But if the key does not exist, we have to make sure, that we really get a reference to the entry of %DATA, not the value of the non-existing entry.

        sub test1 { my $val = shift; $val = 5; } sub test2 { $_[0] = 5; } my ($a, $b) = (3, 3); test1($a); # $a continues to be 3. test2($b); # $b is 5 now.

        But I could have missed something, too.

Re: Missing something with regards to references
by dpuu (Chaplain) on Nov 01, 2004 at 22:42 UTC
    Sometimes with these "what am I missing" type of questions, the best approach is to explain how I'd do it, and then see if that answer somehow hits the one thing you need to understand.

    Lets start with the function usage model: You suggest

    put_element( $DATA{3}, 'h', 'i', 'worked' );
    For clarity, lets do the whole hash. You can create a reference to the top level (root) hash using the backslash operator:
    put_element( \%DATA, 3, 'h', 'i', 'worked' );
    All those quotes get tiresome after a while, so lets get rid of them:
    put_element( \%DATA, qw(3 h i worked) );
    Perl flattens arg-lists, so that is identical to the previous version.

    So now the implementation (I'll omit the error-checks):

    sub put_element { my ($hashref, @path) = @_; # The leaf node and its value are special my $value = pop @path; my $leaf = pop @path; # walk the tree, create non-existing nodes for my $node (@path) { $hashref = $hashref->{$node} ||= {}; } # Finally, set the value at the leaf $hashref->{$leaf} = $value; }
    Hopefully the comments are sufficient to explain my thinking.

    --Dave
    Opinions my own; statements of fact may be in error.
      This is very elegant and makes much more sense to me, so thank you dpuu. But given some of the points that blokhead raised, are there any other gotchas with this approach I should be concerned about?
        I don't think so: my explicit default-assignment should avoid any autovivication issues.
        --Dave
        Opinions my own; statements of fact may be in error.
Re: Missing something with regards to references
by diotalevi (Canon) on Nov 01, 2004 at 23:22 UTC

    Use List::Util::reduce. You have a serious problem if you use the approach you selected. If you have already added "a", "b" = 1, what happens when you say "a", "b", "c" = 2? Using your existing code you will immediately die on symbolic references. Better, solve the problem by putting the child node references into a separate key and store your values in another.

    use List::Util 'reduce'; use Carp 'carp'; # a.tree => b.tree => c.tree => ... z.value sub put_element { 'HASH' eq ref( my $root = shift() ) or carp "put_element( \ %..., [ KEY ], $val ) must be a hash r +eference"; 'ARRAY' eq ref( my $keys = shift() ) or carp "put_element( \ %..., [ KEY ], $val ) must be given an + array reference"; my $value = shift; my $leaf = reduce { $a->{$b}{'tree'} ||= {} } $root, @{$keys}[ 0 . +. $#$keys - 1 ]; $leaf->{$keys->[-1]}{'value'} = $value; }
Re: Missing something with regards to references
by borisz (Canon) on Nov 01, 2004 at 23:16 UTC
    Instead of building the hash so complicated, build another key. In the old times perl did that for you.
    $data{qw/3 h i/} = 'worked'; $data{@array} = 'worked';
    this looks much easier to me.
    Boris
Re: Missing something with regards to references
by cLive ;-) (Prior) on Nov 01, 2004 at 22:27 UTC

    Err, if you have a sub you call like this:

    put_element($DATA{3}, 'h', 'i', 'worked');

    Why not just do:

    $DATA{3}{h}{i}='worked';

    Why write a sub to do it? Or am I missing something?

    cLive ;-)

    update: I am missing something, but I don't get exactly what on rereading your question ;-)

      What you're missing is that this is an example, this is not the actual application or the proper use of the function. It's been simplified in order to be able to directly address the issue I am having without the confusion of the full complexity of the program.

      The whole point of a data structure is to be dynamic... and if it's dynamic then I wont know whether I'm assigning to $DATA{3}{h}{i}, or $DATA{3}{h}{i}{j}, or $DATA{3}{h}{i}{j}{k}. So I need a function that adapts to the number of dimensions of the hash dynamically.

        Right, but my point is, what are you trying to achieve by calling this:
        put_element($DATA{3}, 'h', 'i', 'j', 'k', 'value');

        rather than this:

        $DATA{3}{h}{i}{j}{k} = 'value'

        I don't see how the sub optimizes anything, or provides anything that a direct call does.

        Unless you're calling it like this:

        my @arr = qw(h i j k); put_element($DATA{3}, @arr, 'value');

        Is that how you're using it?

        cLive ;-)

Re: Missing something with regards to references
by Anonymous Monk on Nov 01, 2004 at 22:52 UTC
    Note that if you call put_element($DATA{3}, 'h', 'i', 'worked');, and $DATA{3} isn't a reference, the main datastructure will not be changed. I'd write the function like this:
    sub put { die "Usage: put \\hash, key1, ..., keyn, value\n" unless @_ >= 3; my $hash = shift; die "Not a HASH reference\n" unless ref($hash) eq "HASH"; my $value = pop; my $key = pop; while (@_) { my $key = shift; if (!exists($hash->{$key})) { $hash = $hash->{$key} = { }; next; } elsif (ref($hash->{$key}) eq "HASH") { $hash = $hash->{$key}; next; } else { die "Not a HASH reference\n"; } } $hash->{$key} = $value; }
    Make sure you always pass in a reference to the put function.