nzgrover has asked for the wisdom of the Perl Monks concerning the following question:

I have a Array of Hashes like:
var = [ { 'id_a' => '1', 'id_b' => '5', 'value_x' => '10' 'value_y' => '5' }, { 'id_a' => '2', 'id_b' => '3', 'value_x' => '20' 'value_y' => '10' }, { 'id_a' => '2', 'id_b' => '3', 'value_x' => '30' 'value_y' => '20' }, { 'id_a' => '3', 'id_b' => '7', 'value_x' => '15' 'value_y' => '15' } ]
... ordered by the "id" fields. I wish to find the hashes with duplicate "id"s and "merge" them together. That is, if there are two (or more) Hashes with the same "id"s, I want to add the "value"s together and remove the duplicate Hashs. So with the input data given above, the resultant data would look like:
var = [ { 'id_a' => '1', 'id_b' => '5', 'value_x' => '10' 'value_y' => '5' }, { 'id_a' => '2', 'id_b' => '3', 'value_x' => '50' 'value_y' => '30' }, { 'id_a' => '3', 'id_b' => '7', 'value_x' => '15' 'value_y' => '15' } ]
Any ideas?

Cheers,
Trev

update: Made it more like the real world...

Replies are listed 'Best First'.
Re: Messing about in Arrays of Hashes
by graff (Chancellor) on Sep 21, 2004 at 03:37 UTC
    It sounds like what you really want to do is go from an array of hashes into a single hash:
    my %master_hash; for my $anon_hash_ref ( @AoH ) { $master_hash{ $$anon_hash_ref{id} } += $$anon_hash_ref{value}; }
    Now, %master_hash is keyed by the set of unique ids from the AoH, and its values are the sums of the values for matching ids.

    update (because you updated the question while I was making up the initial answer): assuming that "id_a" always relates to "value_x" and "id_b" to "value_y":

    for my $anonhash ( @AoH ) { $master_hash{ $$anonhash{id_a} } += $$anonhash{value_x}; $master_hash{ $$anonhash{id_b} } += $$anonhash{value_y}; }
    another update: (because the previous update was wrong): Since the "id_a" and "id_b" values in your AoH might "intersect", this will keep them distinct:
    for my $anonhash ( @AoH ) { $master_hash{ "a".$$anonhash{id_a} } += $$anonhash{value_x}; $master_hash{ "b".$$anonhash{id_b} } += $$anonhash{value_y}; }
    FINAL UPDATE: (sheesh!) Okay, based on your later clarification about the problem, I'd still suggest building a single hash as output, but now it should be either a HoH or HoA (whatever you prefer):
    for my $anonhash ( @AoH ) { my $newkey = join '_', 'a', $$anonhash{id_a}, 'b', $$anonhash{id_b +}; # one way (HoH): $master_hash{$newkey}{value_x} += $$anonhash{value_x}; $master_hash{$newkey}{value_y} += $$anonhash{value_y}; # another way (HoA): $master_hash{$newkey}[0] += $$anonhash{value_x}; $master_hash{$newkey}[1] += $$anonhash{value_y}; }
    (Of course, you'll want to delete or comment out whichever pair of lines above you don't prefer.)
      Sorry about whipping the carpet out like that, I had tried to simplify the real world problem of course and realized after I had posted that I had gone to far.
Re: Messing about in Arrays of Hashes
by bobf (Monsignor) on Sep 21, 2004 at 04:06 UTC

    Any time you need to want to eliminate duplicates, think "hash". The following code creates a hash of hashes where the keys are id_a, summing value_x and value_y as it goes. Then the array of hashes is updated using the data in the HoH, sorted by id_a.

    my %HoH; foreach my $hashref ( @AoH ) { $HoH{ ${ $hashref }{id_a} }{id_a} = ${ $hashref }{id_a}; $HoH{ ${ $hashref }{id_a} }{id_b} = ${ $hashref }{id_b}; $HoH{ ${ $hashref }{id_a} }{value_x} += ${ $hashref }{value_x}; $HoH{ ${ $hashref }{id_a} }{value_y} += ${ $hashref }{value_y}; } @AoH = sort { ${ $a }{id_a} <=> ${ $b }{id_a} } ( values %HoH );

    This has been tested using your input data. I'm sure there are much more elegant ways of doing this...
    HTH

    Update: Added code below to meet your new criteria, as stated in this reply. The final array is still sorted on id_a. Note: since id_a and id_b are part of the hash key, you could eliminate those individual keys in the HoH, but I left them in for simplicity.

    my %HoH; foreach my $hashref ( @AoH ) { my $id_ab = ${ $hashref }{id_a} . '_' . ${ $hashref }{id_b}; $HoH{$id_ab}{id_a} = ${ $hashref }{id_a}; $HoH{$id_ab}{id_b} = ${ $hashref }{id_b}; $HoH{$id_ab}{value_x} += ${ $hashref }{value_x}; $HoH{$id_ab}{value_y} += ${ $hashref }{value_y}; } @AoH = sort { ${ $a }{id_a} <=> ${ $b }{id_a} } ( values %HoH );
Re: Messing about in Arrays of Hashes
by tachyon (Chancellor) on Sep 21, 2004 at 03:53 UTC

    The simplest approach is to use a temporary array but you could splice if memory is an issue.

    $var = [ { 'id_a' => '1', 'id_b' => '5', 'value_x' => '10', 'value_y' => '5', }, { 'id_a' => '2', 'id_b' => '3', 'value_x' => '20', 'value_y' => '10', }, { 'id_a' => '2', 'id_b' => '3', 'value_x' => '30', 'value_y' => '20', }, { 'id_a' => '3', 'id_b' => '7', 'value_x' => '15', 'value_y' => '15', }, ]; my $tmp; my $last_id = ''; for my $hash( @$var ) { if ( $hash->{id_a} eq $last_id ) { $tmp->[-1]->{value_x} += $hash->{value_x}; $tmp->[-1]->{value_y} += $hash->{value_y}; } else { $last_id = $hash->{id_a}; push @$tmp, $hash; } } use Data::Dumper; print Dumper $tmp;

    cheers

    tachyon

Re: Messing about in Arrays of Hashes
by Errto (Vicar) on Sep 21, 2004 at 03:44 UTC

    A first go at it, that will work even if the ids are not sorted (assuming we start with $var):

    my %hash; $hash{$_->{id}} += $_->{value} for @$var; $var = [ map { +{id => $_, value => $hash{$_} } } sort keys %hash ];

    Second go, based on your initial assumption, possibly more efficient but uglier for sure:

    my $lastind = -1; for my $i (0 .. @$var - 1) { if ($var->[$i]->{id} eq $var->[$lastind]->{id}) { $var->[$lastind]->{value} += $var->[$i]->{value}; splice @$var, $i, 1; } else { $lastind = $i; } }
Re: Messing about in Arrays of Hashes
by graff (Chancellor) on Sep 21, 2004 at 03:56 UTC
    Would you ever have an AoH like the following, and if so, what would be the right thing to do with it?
    { id_a => 1 id_b => 2 value_x => 20 value_y => 30 } { id_a => 3 id_b => 4 value_x => 40 value_y => 50 } { id_a => 1 id_b => 4 value_x => 100 value_y => 200 }
    Would you want "id_a==1" to come out with 120, and "id_b==4" to come out with 250? Or do you need to keep track if of distinct "id_a/b" tuples?
      Only if BOTH id's match do i then want to add any corresponding values together and knock out the "duplicate" hash.
Re: Messing about in Arrays of Hashes
by TedPride (Priest) on Sep 21, 2004 at 08:00 UTC
    The following requires an additional hash and an additional array, but both contain only pointers, so overhead should be extremely low. Enjoy...
    $var = [ { 'id_a' => '1', 'id_b' => '5', 'value_x' => '10', 'value_y' => '5' }, { 'id_a' => '2', 'id_b' => '3', 'value_x' => '20', 'value_y' => '10' }, { 'id_a' => '2', 'id_b' => '3', 'value_x' => '30', 'value_y' => '20' }, { 'id_a' => '3', 'id_b' => '7', 'value_x' => '15', 'value_y' => '15' }]; my ($id, %ids, @var2); for (my $i = 0; $i <= $#$var; $i++) { $id = @$var[$i]->{'id_a'} . ' ' . @$var[$i]->{'id_b'}; if (!$ids{$id}) { push(@var2, @$var[$i]); $ids{$id} = @$var[$i]; } else { $ids{$id}->{'value_x'} += @$var[$i]->{'value_x'}; $ids{$id}->{'value_y'} += @$var[$i]->{'value_y'}; } } $var = \@var2; foreach (@$var) { print '[' . $line++ . ']' . ' id_a -> ' . $_->{'id_a'} . ' id_b -> ' . $_->{'id_b'} . ' value_x -> ' . $_->{'value_x'} . ' value_y -> ' . $_->{'value_y'} . "\n"; }
    You can cut the last part if you want - that's only to demonstrate that the code works.
      Additional note - if you wish to edit this code for other arrays of hashes, just change the following lines:

      Creates unique ID:

          $id = @$var[$i]->{'id_a'} . ' ' . @$var[$i]->{'id_b'};

      Merges data:

      $ids{$id}->{'value_x'} += @$var[$i]->{'value_x'}; $ids{$id}->{'value_y'} += @$var[$i]->{'value_y'};