hiyall has asked for the wisdom of the Perl Monks concerning the following question:

References to good books or urls covering this issue will be appreciated.

I am new to Perl and confused a bit by hash of hashes, array of hashes:

Given the following %HOH:

# the key field is an arbitrary int that needs to be unique - it serves no purpose other than to provide a unique key for the hash

%HOH = [ { key => 1, person => "Mike", possession => "wallet, keys, car, house", age => 25, }, { key => 2, person => "Mike", possession => "dog, cat, baseball bat", age => 25, }, { key => 3, person => "Dave", possession => "pony, house, car, keys", age => 21, }, ];

How would one consolidate the hash of hashes to another hash of hashes or array of hashes so that if person and age were same, the possession would be a merge of the possession ... resulting in one entry per person/age combination but with all possessions in the array(list) of values for the possession key?

The desired result is:
%HOH = [ { key => 1, person => "Mike", possession => "wallet, keys, car, house, dog, cat, baseball bat" +, age => 25, }, { key => 3, person => "Dave", possession => "pony, house, car, keys", age => 21, }, ];
or
@AOH = ( { person => "Mike", possession => "wallet, keys, car, house, dog, cat, baseball bat" +, age => 25, }, { person => "Dave", possession => "pony, house, car, keys", age => 21, }, );

Replies are listed 'Best First'.
Re: Confused on handling merging values for hash of hashes
by LanX (Saint) on Feb 04, 2015 at 13:38 UTC
    Please never

    %hash=[...]

    but

    %hash=(....)

    instead!

    Cheers Rolf

    PS: Je suis Charlie!

Re: Confused on handling merging values for hash of hashes
by hippo (Archbishop) on Feb 04, 2015 at 13:38 UTC

    The confusion may arise from the fact that those things which you have called %HOH are not hashes of hashes but arrays of hashes (strictly arrayrefs of hashes). You should not try to create them with the % sigil since they are refs. So, if you said:

    $aref = [ { key => 1, person => "Mike", possession => "wallet, keys, car, house, dog, cat, baseball bat" +, age => 25, }, { key => 3, person => "Dave", possession => "pony, house, car, keys", age => 21, }, ];

    then you could print (or even just refer to) the possession of the second element like so:

    print $aref->[1]->{possession};

    HTH

      That's actually worse than that, %hash = [ { A => 1, B => 2 } ]; stringifies the arrayref to turn it into the key. So this is just a hash with a random string in it.

      hiyall, a hash is a structure that gives a name (a key) to everything it contains, so what is inside the { } is an hash (you have the value 1 with the name "A", and the value 2 with the name "B"). A hash is just a list of pairs (key and value), and lists are delimited by parentheses in perl: %hash = (A => 1, B => 2);. You can see the { } as putting the list into a box (a reference actually) that makes it easier to move around as a all instead of carrying around all the elements (this is an approximation of what happens).

      While a hash allows you to get a value by its name, an array puts values in a certain order, at a specific position. So you can make an array with a list like so: @array = (1, 2, 3, 4);. And you can have a reference to an array with square brackets [ ], this is a single element that "contains" the whole array.

      So when you write %hash = [ { A => 1 } ]; you actually put a single element (the [ ] "box") into something that expects names and values.

      Perl by default will allow you to do a lot of things, for him to warn you about this kind of mistakes, you should add:

      use strict; use warnings;
      at the top of your program.

      To understand better what you are using, and check that you get the same thing in a structure than you put there in the first place, you can try:

      use Data::Dumper; # at the top of your program my %hash = (Pablo => "Dog", Rex => "Cat", TheSpiritOfGodzilla => "Fish +"); print Dumper \%hash;

      You'll have to read some documentation if you want to go anywhere. You can try perlsyn and perldsc for a start.

Re: Confused on handling merging values for hash of hashes
by choroba (Cardinal) on Feb 04, 2015 at 13:44 UTC
    Syntax you use is not correct. Populating a hash with square brackets is amost 100% wrong:
    # WRONG %HOH = [ ... ];

    If key is always unique, why it's not the hash key? Read the comments to understand the steps I'd take:

    #!/usr/bin/perl use warnings; use strict; use Data::Dumper; my @AoH = ( { key => 1, person => "Mike", possession => "wallet, keys, car, house", age => 25, }, { key => 2, person => "Mike", possession => "dog, cat, baseball bat", age => 25, }, { key => 3, person => "Dave", possession => "pony, house, car, keys", age => 21, }, ); # Make "key" the hash key: my %HoH; for my $hash (@AoH) { my $key = delete $hash->{key}; $HoH{$key} = $hash; } print Dumper \%HoH; # In fact, using an array for the possessions would be even better. for my $hash (@AoH) { $hash->{possession} = [ split /, /, $hash->{possession} ]; } print Dumper \@AoH; # You don't need any key. You want to hash by person and age. Using pu +sh merges Mike's possessions. my %HoH2; for my $hash (@AoH) { push @{ $HoH2{ $hash->{person} }{ $hash->{age} } }, @{ $hash->{pos +session} }; } print Dumper \%HoH2;
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: Confused on handling merging values for hash of hashes
by shmem (Chancellor) on Feb 04, 2015 at 21:21 UTC

    Read perlref. Again. Play with the structures.

    Your %HOH corrected:

    %HOH = ( # not [ ! 1 => { # key => 1, person => "Mike", possession => "wallet, keys, car, house", age => 25, }, 2 => { # key => 2, person => "Mike", possession => "dog, cat, baseball bat", age => 25, }, 3 => { # key => 3, person => "Dave", possession => "pony, house, car, keys", age => 21, }, ); # not ] !

    Likewise for your @AOH. Spot the error. Read perldata and perlref, again. Now to your ostensible problem:

    How would one consolidate the hash of hashes to another hash of hashes or array of hashes so that if person and age were same, the possession would be a merge of the possession ... resulting in one entry per person/age combination but with all possessions in the array(list) of values for the possession key?
    use Data::Dumper; use strict; use warnings; my %HOH = ( # not [ ! 1 => { # key => 1, person => "Mike", possession => "wallet, keys, car, house", age => 25, }, 2 => { # key => 2, person => "Mike", possession => "dog, cat, baseball bat", age => 25, }, 3 => { # key => 3, person => "Dave", possession => "pony, house, car, keys", age => 21, }, ); # not ] ! my %NewHOH; # consolidated hash. { # limit scope of %seen to this block my %seen; # intermediate hash. Key is "person:age" for my $key (keys %HOH) { # iterate over the keys of the hash my $val = $HOH{$key}; # get the value - an anonymous hash my $composite_key = join ":", $HOH{$key}->{person}, $HOH{$key}->{a +ge}; if ($seen{$composite_key}++) { # note: post increment! # we make the comma-separated string into an array. # then we build a hash using the array elements as keys. # this avoids duplicates. We later convert that back. my @possessions = split /,\s*/, $HOH{$key}->{possession}; for my $item (@possessions) { $NewHOH{$composite_key}->{possession}->{$item}++; } } else { # first occurence of this "person:age" key $NewHOH{$composite_key} = $HOH{$key}; # same as above. Make a hash of the items of "possession" $NewHOH{$composite_key}->{possession} = { map { $_, 1 } split /,\s*/, $HOH{$key}->{possession} }; + # record the first key, we'll use that to re-key the hash $NewHOH{$composite_key}->{key} = $key; } } } # now get rid of composite keys and make the anon arrays # of the "possession" value into a string for my $key ( keys %NewHOH ) { my $val = delete $NewHOH{$key}; # returns the anon hash my $new_key = delete $val->{key}; # old key which was remembered $NewHOH{$new_key} = $val; $val->{possession} = join ", ", keys %{$val->{possession}}; } # debug output $Data::Dumper::Indent = 1; my $d = Data::Dumper->new( [\%NewHOH], ['NewHOH'], ); print $d->Dump; __END__ $NewHOH = { '1' => { 'possession' => 'house, keys, baseball bat, cat, dog, car, wallet' +, 'person' => 'Mike', 'age' => 25 }, '3' => { 'possession' => 'keys, house, pony, car', 'person' => 'Dave', 'age' => 21 } };

    Note that after __END__, the output shows an anonymous hash! See Data::Dumper, map and split. See perlref, perldata.

    perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'
Re: Confused on handling merging values for hash of hashes
by locked_user sundialsvc4 (Abbot) on Feb 04, 2015 at 17:47 UTC

    Just a few “food for” thoughts here . . .

    (1)   It does you no good at all if you have a unique-integer “only to serve as the key to a hash.”   A hash-key should be something that you search for.   It could be a string that combines several fields that you need to search-for at once, e.g. “Mike:24”.   If you’re going to choose a hash, choose (or coin) a key that works for you.

    (2)   It’s perfectly legitimate for one thing (say, a hashref corresponding to “one record,” to be referenced by more-than-one hash at a time, much as database rows might be referenced by more than one index.   If you need to store more than one hashref under a single key, simply make the index-hash point to an arrayref containing zero-or-more hashrefs.   (Google abut Perl’s auto-vivification feature; also exists().)

    (3)   If you need to track the person’s possessions, a string is certainly one way to do it.   Another way is to use an arrayref.   A third is to use a hashref in which the value being stored is just a dummy-value, and the key is what you are actually interested in.   When the time comes to produce the list of possessions, e.g. join(", ", sort keys %$myhash).

    (4)   The most important thing to wrap your head around, with regards to Perl, is the all-important idea of references.   There is, for example, no such thing as a hash “of hashes.”   No, what the hash actually contains is a reference to something else.   (And if that something-else is a hash, we call the reference to it a “hashref.”   But, “the thing being referred to” is one thing, and every one of the references to it are separate, distinct, memory objects of a “scalar” type.   It is quite common for there to be things hanging-around in memory which are anonymous values:   no variable contains or refers-to them directly.   It’s perfectly all right for there to be more-than-one reference to the same (anonymous, or not ...) thing.   Perl has a very efficient and powerful memory manager which maintains reference-counts for everything and which keeps storage tidy and neat.

    (5)   Perl’s syntax is also designed around the notion that “there’s more than one way to say the same thing,” especially when it comes to referring to things in memory.   This can be quite confusing if you are more accustomed to strongly-typed languages, “the one ‘right’ way,” and getting rewarded with syntax error messages at compile-time if you don’t write your code in just that way.   Quite frankly, I found it baffling at first.   Even when you use strict; use warnings;, Perl is an extremely dynamic system.   This is part of what makes it so damned powerful, but do not apologize for feeling bewildered.