combining hashes based on key values

punkish has asked for the wisdom of the Perl Monks concerning the following question:

perldata monks,

I have

my %in_hash = (
  jibber => [
              {id =>  1, score => 3, name => 'foo'},
              {id =>  5, score => 1, name => 'bar'},
              {id => 22, score => 6, name => 'baz'},
            ],
  jabber => [
              {id =>  3, score => 1, name => 'boo'},
              {id =>  5, score => 3, name => 'bar'},
              {id => 12, score => 2, name => 'zib'},
              {id => 22, score => 2, name => 'baz'},
            ],
);
[download]

and I want


my %out_hash = (
  'jibber jabber' => [
                        {id =>  5, score => 4, name => 'bar'},
                        {id => 22, score => 8, name => 'baz'},
                     ],
  'jibber'        => [
                        {id =>  1, score => 3, name => 'foo'},
                     ],
  'jabber'        => [
                        {id =>  3, score => 1, name => 'boo'},
                        {id => 12, score => 2, name => 'zib'},
                     ],
)
[download]

The %out_hash combines the entries in the incoming so that keys with common ids get their scores summed up, and other keys are left with the non-common id entries.

Monks that might ask what I have done thus far -- to them I spake, nothing more than constructing the above example hashes. Essentially, I am drawing a blank (other than doing some terrible, long-drawn, brute force method the very thought of which numbs me).

Update: Corrected the hash defs as per the most fascinating tlm

--

when small people start casting long shadows, it is time to go to bed

Comment on combining hashes based on key values Select or Download Code

Replies are listed 'Best First'.
Re: combining hashes based on key values by davido (Cardinal) on Jun 08, 2005 at 04:00 UTC
You've got several smaller chores that add up to the solution to your problem. Tackling the small chores will help you to solve the bigger picture. In %out_hash, 'jibber jabber' holds those id's that are found in both 'jibber' and in 'jabber'. This is an intersection. You also want %out_hash to keep the id's from "jibber" that are unique to "jibber" within the key named "jibber", and the same for "jabber". This is the symmetric difference. You may get a start by looking at List::Compare, and by reading perlfaq4 under the section called "How do I compute the difference of two arrays? How do I compute the intersection of two arrays?" For your purposes, your input hashes are simply sparse arrays. Treat them as lists of id's indexed by hash keys instead of array indices, and you'll find your way around the problem. This response assumes you already know your way around Perl's references and complex datastructures. If you need help tackling those too just let us know where you need clarification. Dave	[reply]
Re^2: combining hashes based on key values by punkish (Priest) on Jun 08, 2005 at 04:09 UTC
Very sweet. You have set me on the right path... not the shortest path, but the most interesting, meandering, full-of-knowledge path. There seems to be much fun in List::Compare Thanks. -- when small people start casting long shadows, it is time to go to bed	[reply]
Re^3: combining hashes based on key values by davido (Cardinal) on Jun 08, 2005 at 05:00 UTC
As I look at it again, you're more interested in left complement and right complement than in symmetrical difference, because you're going to need to build a list based on what's only in jibber, a list based on what's only in jabber, and then the intersection (a list based on what's in both jibber and in jabber). List::Compare is going to get you most of the way there, if you look at the two input lists as: `@{$in_hash{jibber}} # Left list @{$in_hash{jabber}} # Right list` [download] Dave	[reply] [d/l]
Re: combining hashes based on key values by jdporter (Paladin) on Jun 08, 2005 at 04:15 UTC
# first, determine all the id's and which sets they appear in. # presume that no id can appear more than once per set. # also initialize the (one) record per id for the result. my %by_id; my %out_rec; for my $set ( keys %in_hash ) { for ( @{ $in_hash{$set} } ) { $by_id{$_->{'id'}}{$set} = $_; %{$out_rec{$_->{'id'}}} = %$_; } } my %out_hash; for my $id ( keys %by_id ) { my @sets = sort keys %{ $by_id{$id} }; $out_rec{$id}{'score'} = 0; $out_rec{$id}{'score'} += $by_id{$id}{$_}{'score'} for @sets; push @{ $out_hash{"@sets"} }, $out_rec{$id}; } # and at this point %out_hash contains the desired data. [download]	[reply] [d/l]
Re: combining hashes based on key values by tlm (Prior) on Jun 08, 2005 at 04:20 UTC
Note that the original definitions of the hashes was wrong; use `()`'s, not `{}`. The latter are for defining refs to anonymous hashes. The solution below assumes that there is a 1-to-1 correspondence between ids and names; this seems problematic to me, since the two can easily get out of synch. I think that a better design would be to have only ids and scores in the input hash, and have a separate hash only for associating ids and names. Read more... (2 kB) the lowliest monk	[reply] [d/l]
Re: combining hashes based on key values by mifflin (Curate) on Jun 08, 2005 at 04:31 UTC
I think I got it. But I had to create a temp hash to build it. use strict; use warnings; use Data::Dumper; my %in_hash = ( jibber => [ {id => 1, score => 3, name => 'foo'}, {id => 5, score => 1, name => 'bar'}, {id => 22, score => 6, name => 'baz'}, ], jabber => [ {id => 3, score => 1, name => 'boo'}, {id => 5, score => 3, name => 'bar'}, {id => 12, score => 2, name => 'zib'}, {id => 22, score => 2, name => 'baz'}, ], ); my %tmp; for my $key (keys %in_hash) { for my $href (@{$in_hash{$key}}) { my $id = $href->{id}; if (exists($tmp{$id})) { $tmp{$id}->{score} += $href->{score}; } else { $tmp{$id} = $href; } $tmp{$id}->{keys}->{$key}++; } } my %out_hash; for my $id (keys %tmp) { my $keys = $tmp{$id}->{keys}; delete $tmp{$id}->{keys}; my $new_key = join(' ', reverse sort keys %$keys); push(@{$out_hash{$new_key}}, $tmp{$id}); } print Dumper \%out_hash; [download] output... `C:\tmp>erick.pl $VAR1 = { 'jibber' => [ { 'name' => 'foo', 'score' => 3, 'id' => 1 } ], 'jabber' => [ { 'name' => 'boo', 'score' => 1, 'id' => 3 }, { 'name' => 'zib', 'score' => 2, 'id' => 12 } ], 'jibber jabber' => [ { 'name' => 'baz', 'score' => 8, 'id' => 22 }, { 'name' => 'bar', 'score' => 4, 'id' => 5 } ] };` [download]	[reply] [d/l] [select]