Henri has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

this is probably simple, but still I am stuck. How do you access a hash through its hash reference and test if part of it (a subhash) exists? What I would like to do is print individuals only if coordinate information is available for them. Information about the individuals is stored in one hash accessed through a hash reference, coordinate information in another hash accessed through a hash reference. This is the code:

use strict; use warnings; my $individuals = {}; my $coordinates = {}; # Fill the hashes associated with $individuals and $coordinates # in subroutines so that # $$coordinates{$group}{$id}{$stage}{"coords"}{$coord_no} = $value print_coords($individuals, $coordinates); sub print_coords { my ($individuals, $coordinates) = @_; foreach my $group (sort keys %$individuals) { foreach my $id (sort keys %{$$individuals{$group}}) { INDIVIDUAL: foreach my $stage (sort keys %{$$individuals{$group}{$id}} +) { # here the test if (!(%{$$coordinates{$group}{$id}{$stage}{"coords"})) + { next INDIVIDUAL; } # print individual and coordinate information } } } }

Running it results in: "Can't use string ("") as a HASH ref while "strict refs" in use at ...

I have been struggling with this for a while and would be very happy if you could help me with it. Henri

Replies are listed 'Best First'.
Re: Test if a subhash in a referenced hash exists
by moritz (Cardinal) on May 28, 2010 at 08:34 UTC
    if (not exists $coordinates->{$group}{$id}{$stage}{"coords"}) { ...}

    (untested)

    Perl 6 - links to (nearly) everything that is Perl 6.
      If $group, $id, $stage etc don't exist in $coordinates, then that will auto-vivify them. You need to check the existence of each level in the hash if magically creating them merely by looking at them is a problem for your application. Something like this (ignoring your application's logic for the sake of making my example code clearer):
      if(exists($coordinates->{$group})) { if(exists($coordinates->{$group}->{$id})) { if(exists($coordinates->{$group}->{$id}->{$stage})) { ... } } }
      or to generalise (untested. haven't even tried to compile it) ...
      my $result = do_stuff_in_hash_without_autovivifying( sub { my $hash_to_work_on = shift; # do stuff here }, $coordinates, # initial hash $group, $id, $stage, 'coords' # list of keys to traverse ); sub do_stuff_in_hash_without_autovivifying { my($sub, $hash, @keys) = @_; if(@keys && exists($hash->{$keys[0]}) { return do_stuff_in_hash_without_autovivifying( $sub, $hash->{$keys[0]}, @keys[1 .. $#keys] ); } else { return $sub->($hash); } }
      If you're *really* paranoid about auto-vivification, then you can subvert Tie::Hash::Vivify to turn it into a fatal error instead of silent data corruption:
      use Tie::Hash::Vivify; use Data::Dumper; my $hash = Tie::Hash::Vivify->new(sub { die("No auto-vivifying! Bad programmer! No bikkit!\n".Dumper(\@_)) });
        Another way to prevent autovivification is to use Data::Diver.
        Perl 6 - links to (nearly) everything that is Perl 6.
        Thanks for pointing out that autovivificated hash parts stay around. Reading about autovivification I always thought that those parts would appear on the fly and be gone again after they have been looked at. Since I actually might test hashes several times, I will be on the outlook for this situation.
      Moritz, your reply helped me find the problem. After your suggestion did not work either I rechecked the subroutine and found that I had not passed $coordinates into it. Doing so, now your suggestion and my original line both work. Also, I thought about exist, defined and true and think I should use exist, because it is a prerequisite to defined and true. Thanks a lot for your help! Henri

        ...but note that exists does not test what you asked for in the subject line, i.e. if a subhash exists — it just tests whether the respective hash key exists, or more precisely, whether the respective hash element has ever been initialized (with whatever value, including undef).

        $data->{foo} = ""; if ( exists $data->{foo} ) { print $data->{foo}{bar}; # 'Can't use string ("") as a HASH ref +...' }

        or with undef as value (in which case the missing hash would be autovivfied)

        $data->{foo} = undef; if ( exists $data->{foo} ) { print $data->{foo}{bar}; # 'Use of uninitialized value in print +...' }
Re: Test if a subhash in a referenced hash exists
by almut (Canon) on May 28, 2010 at 10:09 UTC
    "Can't use string ("") as a HASH ref while "strict refs" in use

    That error results from trying to evaluate an empty string as a hash reference.  In other words, the part you have in between %{...} most likely evaluates to the empty string, instead of the hashref that would be needed here:

    use strict; use warnings; my $data = {}; $data->{foo} = { a => 1}; $data->{bar} = ""; if ( %{ $data->{foo} } ) { ... } # ok, because $data->{foo} is a has +href if ( %{ $data->{bar} } ) { ... } # not ok, because $data->{bar} is e +mpty

    Best is probably to directly test whether the value in question is a hashref, i.e.

    if ( ref($data->{bar}) eq "HASH" ) { ... } # ok, even if $data->{bar +} is empty/undef

    or, adapted to your example:

    ... next unless ref($coordinates->{$group}{$id}{$stage}{"coords"}) eq +"HASH"; # print individual and coordinate information

    See ref.

      The empty string crept in since I had a legacy line in my code that set $coordinates = “” <grmpf>. I am trying to clean up a script that has grown over the past years. I had tried ref, but it had not worked, probably due to my notation confusion. Your %{$data->{foo}} notation is another variant to the ones I just posted.
Re: Test if a subhash in a referenced hash exists
by Henri (Novice) on May 29, 2010 at 13:12 UTC
    Thanks to all of you, your comments are really helpful to me. The background of my question is, that I am at a point where my scripts are getting more numerous and complex, while I am still just repeating what has been working in the past without necessarily understanding why. To be able to make a step ahead I dearly need to clarify some basic concepts. These seem obvious when I read about them, but get mangled when I try to apply them. I am just not comfortable with (de-) referencing complex data structures and testing for existence.

    Part of my confusion seems to boil down to the following:
    All of you write my hash structures like this

    if (not exists $coordinates->{$group}{$id}{$stage}{"coords"}) {
    and as I understand the two following notations are equivalent
    if (not exists ${$coordinates}{$group}{$id}{$stage}{"coords"}) { if (not exists ${${${${$coordinates}{$group}}{$id}}{$stage}}{"coords"} +)
    All three work, they are dereferencing the hash reference $coordinates and provide the hash reference of the anonymous subhash that has at its top level the coordinate numbers as keys.

    The following tests for the $coordinates-> notation also go through without error:

    if (not defined $coordinates->{$group}{$id}{$stage}{"coords"}) { if (!($coordinates->{$group}{$id}{$stage}{"coords"})) {
    However, what about %{$$coordinates{$group}{$id}{$stage}{"coords"}}that is used in
    foreach my $coord_no (keys %{$$coordinates{$group}{$id}{$stage}{"coord +s"}}) {…}
    I always thought this is equivalent and also would give the hash reference of the anonymous subhash? But
    if (not exists %{$$coordinates{$group}{$id}{$stage}{"coords"}}) { exists argument is not a HASH or ARRAY element
    Thus, what does it return? And why is what it returns defined and true, but does not exist:
    if (not defined %{$$coordinates{$group}{$id}{$stage}{"coords"}}) { if (!(%{$$coordinates{$group}{$id}{$stage}{"coords"}})) {
    both go through just fine.

    To make my confusion complete, tests of a similar hash give different results:

    if (not exists %{$$original_data{$group}{$id}{$stage}}) { # exists argument is not a HASH or ARRAY element if (not defined %{$$original_data{$group}{$id}{$stage}}) { # works if (!(%{$$original_data{$group}{$id}{$stage}})) { # Can't use an undefined value as a HASH reference
    Mmh.
      Have a look at the very useful References quick reference.

      Also, the keys doc. The argument for keys must be a hash and has a % sigil.

      for my $key (keys %hash){ #... } for my $key (keys %{$hashref}){ # dereferencing a hash #... }
      The exists doc tells you its argument is a hash element which will have a $ sigil. You need to read your error message
      exists argument is not a HASH or ARRAY element
      as
      exists argument is neither a HASH element nor an ARRAY element
      if (exists $hash{some_key}){ #.. } if (exists $hashref->{some_key}){ # dereferencing a hash element using + an arrow #.. }
      An observation.
      ...what about %{$$coordinates{$group}{$id}{$stage}{"coords"}}...
      The %{...} does the dereferencing, there's no need for the extra $ in $$. I prefer the -> for dereferencing (many monks don't) but it is often unnecessary (as in this case).

      Have a look at the tutorial linked to above. I look at it at least once a week. :-)

      update: When I'm have a fight with a convoluted data structure I try out the syntax on a simplified version and get that working first. And while you're doing that don't forget the mighty Data::Dumper. Good luck!

        wfsp, your mentioning that exists and keys need different types of input in combination with Data::Dumper got me on the right track. I checked the results of the different notations with Data::Dumper and voila! With this a major brain knot got untangled and some of the error messages are now actually starting to make sense to me.
        print Dumper($coordinates->{"AC"}{"132"}{"0"}{"coords"}); $VAR1 = { '1' => { 'value' => '4411478.623', 'name' => 'Xgeo' }, '2' => { 'value' => '5953375.013', 'name' => 'Ygeo' } };
        Here a scalar is returned. $VAR1 I think is the hash reference of the subhash.
        print Dumper(%{$$coordinates{"BD"}{"132"}{"0"}{"coords"}}); $VAR1 = '1'; $VAR2 = { 'value' => '4411478.623', 'name' => 'Xgeo' }; $VAR3 = '2'; $VAR4 = { 'value' => '5953375.013', 'name' => 'Ygeo' };
        This returns the hash contents (keys and the references of the next level subhashes) as a list.

        From how exists and defined behave, I take it that exists requires a single element, while defined is able to handle lists, too. I have to check that. But I take it I should test for the subhash reference ($hasref->) as almut pointed out and don't test the hash list itself (%{$$hasref...}).

        The refs quick ref page and the keys page were new to me, thanks for pointing me there. Most recently I have been looking at perlref, perlreftut, perlsub, perlvar, perldata and exits. Generally, most of the examples seem to deal with the topmost layer of a HoH or its final leaves layer ie. when you finally access a scalar. Examples for the middle of a HoH when you have to deal with subhashes and subarrays, however, are much rarer. Maybe it is just obvious, but to me it's not intuitive from the start. At the topmost layer you access the hash (reference) - results are from print

        $hashref # HASH(x183e2e8) %($hashref} # %{HASH(x183e2e8)} %$hashref # %{HASH(x183e2e8)}
        and you access the leaves
        $$hashref{$key1}{$key2}{$key3} # = scalar value
        In the middle you always need the extra dereference to first get at the subhash reference (a scalar) and then with it access the subhash:
        $$hashref{$key1}{$key2}{$key3} # HASH(0x18b9df8) $hashref->{$key1}{$key2}{$key3} # HASH(0x18b9df8) %($$hashref{$key1}{$key2}{$key3}} # 1HASH(0x18b9e58)2HASH(0x18b9e88)
        The following notations don’t work:
        %($hashref{$key1}{$key2}{$key3}} # Global symbol %hashref requires e +xplicit package name %$hashref{$key1}{$key2}{$key3} # syntax error

        Your comments and suggestions really helped me to cut that knot and understand a bit better what I am doing. Thanks to all of you!