shan_emails has asked for the wisdom of the Perl Monks concerning the following question:

Hi All,

I need to compare two hashes and generate the union of these hashes.

The below code updated the h1 hash also. But i don't want to update h1 hash. So kindly correct me where i made wrong.

$h1 = { 'drug comparison' => { '7003000.xml' => { 'entity' => 'a1, a2, a3' }, '70037559.xml' => { 'entity' => 'x1, x2, x3' } } }; $h2 = { 'drug comparison' => { '7004562.xml' => { 'entity' => 'z1, z2, z3' }, '70037559.xml' => { 'entity' => 'e1, e2, e3' } } }; $h3 = union_of_h1_h2 ($h1, $h2); print $h3; sub union_of_h1_h2 { my ($ai, $reference) = @_; my %output = %$ai; my %ref = %$reference; foreach my $_ref (keys %ref) { unless (exists $output{$_ref}) { $output{$_ref} = $ref{$_ref}; }else { foreach my $filename (keys %{$ref{$_ref}}) { if (!exists $output{$_ref}{$filename}) { $output{$_ref}{$filename} = $ref{$_ref}{$filename} +; } } } } return \%output; }


I need the output hash h3 should be
$h3 = { 'drug comparison' => { '7003000.xml' => { 'entity' => 'a1, a2, a3' }, '70037559.xml' => { 'entity' => 'x1, x2, x3' }, '7004562.xml' => { 'entity' => 'z1, z2, z3' } } };


Thanks in advance,
Shanmugam A.

Replies are listed 'Best First'.
Re: Generate union of the two hashes
by kennethk (Abbot) on Sep 08, 2010 at 14:07 UTC
    As the thing you want to output is a hash ref, the statement print $h3 will output something like HASH(0x182f384). You will get something closer to your expected output by using Data::Dumper, swapping your print statement for use Data::Dumper; print Dumper $h3;. See How can I visualize my complex data structure? for more information.

    The reason you are clobbering your initial hash is because your structure is a hash of hashes, so while they are different at the base level, you have the same object (a hash reference) associated with the key drug comparison. To make a shallow copy, I usually use Data::Dumper again, this time in an eval in a do:

    my $h1 = { 'drug comparison' => { '7003000.xml' => { 'entity' => 'a1, a2, a3' }, '70037559.xml' => { 'entity' => 'x1, x2, x3' } } }; my $copy = do{my $VAR1; eval Dumper $h1; $VAR1;};
    I would suggest making the copy in your union_of_h1_h2 sub.
      Hi All,

      Thanks for all your answers.

      now i used "Clone" module. so the address of the copying hash has been changed. so now the changes affected only in h3 hash.

      use Clone qw(clone); $h1 = { 'drug comparison' => { '7003000.xml' => { 'entity' => 'a1, a2, a3' }, '70037559.xml' => { 'entity' => 'x1, x2, x3' } } }; $h2 = { 'drug comparison' => { '7004562.xml' => { 'entity' => 'z1, z2, z3' }, '70037559.xml' => { 'entity' => 'e1, e2, e3' } } }; $h3 = union_of_h1_h2 ($h1, $h2); print $h3; sub union_of_h1_h2 { my ($ai, $ref) = @_; my $output = clone($ai); foreach my $_ref (keys %$ref) { unless (exists $output->{$_ref}) { $output->{$_ref} = $ref->{$_ref}; }else { foreach my $filename (keys %{$ref->{$_ref}}) { if (!exists $output->{$_ref}->{$filename}) { $output->{$_ref}->{$filename} = $ref->{$_ref}->{$f +ilename}; } } } } return $output; }


      Wishes
      Shanmugam A.
Re: Generate union of the two hashes
by CountZero (Bishop) on Sep 08, 2010 at 13:44 UTC
    Are you sure your output is correct? You seem to have lost the 'entity' => 'e1, e2, e3' element of 70037559.xml in the union.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Re: Generate union of the two hashes
by MajingaZ (Beadle) on Sep 08, 2010 at 14:41 UTC
    As I am a bum and want to simplify my answer, I'll treat $h1 and $h2 as %h1 and %h2 instead of refs to anon hashes.
    Currently you have %h2 values overwriting the %h1 values, is that how you want your union to work?
    You could merge the values where %h2 looks to see if there is a value for that key in %h1 and then merge those values accordingly
    Also I presume you are doing something with the %h2 hash after merging in the %h1 data other than just printing right? If just printing, just skip the keys that exist in the hash that should have precedence if clobbering.

    One suggestion to make your code more readable / maintainable is not to use reference as a variable name, perhaps (source, target as names instead?). Per Damien I name my references _ref so I know what they are when I read them.

    Updated:
    sub union_of_h1_h2 { my ($h1_ref, $h2_ref) = @_; my %output = %$h1_ref; for my $topkey (keys %{$h2_ref}) { for my $filename (keys %{$h2_ref{$topkey}}) { for my $bottomkey (keys %{$h2_ref{$topkey}{$filename}}) { $output{$topkey}{$filename}{$bottomkey} = $h2_ref->{$topkey +}{$filename}{$bottomkey} unless (Condition); } } } return \%output; }

    I'm unclear what condition you want so I just left it there as Condition
    Also might want to rename the keys there from (topkey,filename,bottomkey) to things that actually make sense to you.