kiz has asked for the wisdom of the Perl Monks concerning the following question:

I've been looking to merge a number of hashes, and found shotgunefx's node from 2002. This is fine, but is an or merge, and I needed the option of an and merge. So, here is my solution:

use Data::Dumper; ### Data preperation $a = { 'pope' => { 'St. Andrews' => { '1' => { 'GB-0227-UYMUN' => 1 }, '2' => { 'GB-0227-UYUY-103' => 1, 'GB-0227-UYUY-101' => 1, } }, }, 'antipope' => { 'St. Andrews' => { '1' => { 'GB-0227-UYMUN' => 1 }, '2' => { 'GB-0227-UYUY-100' => 1 } }, 'Edinburgh' => { '1' => { 'GB-0227-UYMUN' => 1 }, '2' => { 'GB-0227-UYUY-100' => 1 } } }, 'ryaltie' => { 'St. Andrews' => { '2' => { 'GB-0227-UYSL-155' => 1, } } } }; $b = { 'pope' => { 'St. Andrews' => { '1' => { 'GB-0227-UYMUN' => 1 }, '2' => { 'GB-0227-UYUY-101' => 1, } }, }, 'antipope' => { 'Edinburgh' => { '1' => { 'GB-0227-UYMUN' => 1 }, '2' => { 'GB-0227-UYUY-100' => 1 } } }, 'ryaltie' => { 'St. Andrews' => { '2' => { 'GB-0227-UYSL-156' => 1 } } } }; ## a couple of variables to be used to demo the routine $c = {}; $d = {}; # This is a record of the hashrefs we've already processed. This is # simply to stop us getting into a circular reference situation, and # spirling round. we do a "die" so it's evident what's happened. my %SeenMerged = (); sub merge_hashes { my $mode = shift; # "and" or "or" my @hashrefs = @_; # the references to the two hashes to merge # check that we were actually passed a couple of hash references die "Passed a non hashref" if grep { ref $_ ne 'HASH' } @hashrefs; # Set up some variables my %merged = (); # This is the final hash of merged data my %intermediate = (); # this is the hash for an intermediate # stage # @SeenMerged{@hashrefs} produces a list (actually, an array-slice) # of values in %SeenMerged, for the keys defined in @hashref. The # 'grep' checks to see if any of them are references to hashes, and # therefor @seen is a list of the hashrefs we have alredy merged my @seen = grep { ref $_ eq 'HASH' } @SeenMerged{@hashrefs}; if (@seen){ die "contains a circular reference! bailing..."; } # Add the two current hashrefs to our list of seen hashrefs @SeenMerged{@hashrefs} = @hashrefs; # step 1: Create a hash where every key in all the hashes is listed, # and they refer to a anonymous list of values, compiled from the # various hashes in the @hashref list foreach my $h (@hashrefs){ # for each key/value pair while (my ($k,$v) = each %$h ){ push @{$intermediate{$k}}, $v; # push the value onto an # anonymous list indexed by # the key } } ## end of each hashref passed in.. # step 2: for each of the keys in this intermediate hash, grab the # key and the value (a reference to the anonymous list) while (my ($k,$v) = each %intermediate){ # get a list of the values from the anonymous list which are # references to hashes my @hashes = grep { ref $_ eq 'HASH' } @$v; # We have different routines for "and" and "or" searches if ($mode eq "and") { # an "and" search, # If the number of entries in the list of values matches the # number of hashrefs passed in, then we have a value in each of # the hashes listed in the hashref if (@$v == @hashrefs) { # if *all* the values are references, then we need to recurse # down this set of branches if (@hashes == @hashrefs) { my $recursed = merge_hashes($mode, @hashes); # If the merge routine returns a defined value we add it to # the merged data structure. if (defined($recursed)) { $merged{$k} = $recursed; } } else { # leaf node section # check *all* the values match. We do this quite neatly by # turning all the values into a key in a temporary hash. All # matching keys will result in a single key, so if there's # more than one key we know not all keys were the same. my %tmp; @tmp{@$v} = (1)x@$v; if (keys(%tmp) == 1) { # In my application, the leaf nodes are all the same value. $merged{$k} = $v->[0] } } ## end of leaf node section } ## end of "number of values == number of hashrefs passed in" } ## end of $mode eq "and" else { # this is an "or" search # if the values are references, recurse if (@hashes) { $merged{$k} = merge_hashes($mode, @hashes) if @hashes; } else { $merged{$k} = $v->[0] # it's a leaf node (the elementids). # They always have the same value } } ## end of $mode is "or" } ## end of while itterate over %merged # remove the hashrefs we've just merged in delete @SeenMerged{@hashrefs}; # return a reference to the merged hash, or undef if the hash is # empty return keys %merged ? \%merged : undef; } ## end of sub merge_hashes ######################## my @references = ($a, $b); $c = merge_hashes("or", @references); $d = merge_hashes("and", @references); print ("* or *\n",Dumper $c); print ("\n* and *\n",Dumper $d);

Question: Can it be improved? (well, apart from removing all the comments, thus halving the number of lines in the file <grin />
(Once I'm happy with it, I'll pass it onto Michael K. Neylon for possible inclusion into his Hash::Merge package.)


-- Ian Stuart
A man depriving some poor village, somewhere, of a first-class idiot.

janitored by ybiC: Balanced <readmore> tags around longish codeblock