http://qs1969.pair.com?node_id=344057

JYDawg has asked for the wisdom of the Perl Monks concerning the following question:

Fellow wizards and witches,

I often find myself comparing hashes of file information or options and taking steps according on the differences. Normally I would loop though one of the hashes removing the equal items on both sides and then combining what's left. However after toiling with referenced array's in hashes for e.g. Template Toolkit I'm curious: Is there another (simpler) way of comparing hashes?

Example:

$given = { 'Subtype' => [ { 'url' => 'http://www.google.nl/', 'title' => 'testNL' }, { 'url' => 'http://www.google.be/', 'title' => 'testBE' } ], 'name' => 'test1' }; $retrieved = { 'Subtype' => [ { 'url' => 'http://www.google.nl/', 'title' => 'testNL' }, { 'url' => 'http://www.google.be/', 'title' => 'testBE' }, { 'url' => 'http://www.google.de/', 'title' => 'testBE' } ], 'name' => 'test2', 'type' => 'test2' };
The result should be:
$result = { 'Subtype' => [ { 'url' => 'http://www.google.de/', 'title' => 'testBE' } ], 'name' => 'test2', 'type' => 'test2' };

Thanks,

John

--- Lead me not into temptation for I can find it myself...

Replies are listed 'Best First'.
Re: find differences between multiple hashes
by kvale (Monsignor) on Apr 10, 2004 at 02:43 UTC
    Comparing two general hierarchical data structures is in general a hard problem. First, you have to establish a criterion for equivalency. Do hash values have to be exactly the same, or is it the values' contents? Must arrays have exactly the same elements in the same order, or is it that they form equivalent sets good enough? Second, you have to come up with a search strategy.

    For instance, for a hash of hashes and assuming $retrieved is a superset of $given, the following can be used:

    my $result = {}; foreach my $main_key (keys %$retrieved) { unless (exists $given->{$main_key} ) { $result->{$main_key} = $retrieved->{$main_key}; next; } # The key exists, compare subhashes foreach my $sub_key (keys %$main_key) { $result->{$main_key}{$sub_key} = $retrieved->{$main_key}{$sub_ke +y} unless exists $given->{$main_key}{$sub_key} && $given->{$main_key}{$sub_key} eq $retrieved->{$main_key +}{$sub_key}; } }
    The idea is that given your data structure and equivalence criteria, you can drill down and simply do comparisons, rather than deletions. This should be quicker. For your particular application, I cannot discern your equivalence criterion, so I'll stop here.

    -Mark

Re: find differences between multiple hashes
by tachyon (Chancellor) on Apr 10, 2004 at 06:42 UTC

    Data::Diff or Struct::Diff will tell you if the structures differ. To drill down and compare arbitrary perl structures to generate a perl structure of the diffs is a task similar (but more complex) than that done by Data::Dumper Data::Denter or YAML.

    AFAIK there is no module that currently does this. I would suggest that writing a minimum case to deal with your data (like it sounds your have) is the best solution short of rethinking the app logic. BTW your desired result output is logically inconsistent (as noted by kvale) and should also contain 'name' => 'test1'.

    cheers

    tachyon

Re: find differences between multiple hashes
by BrowserUk (Patriarch) on Apr 10, 2004 at 07:46 UTC

    Here's a crude (but accurate) method of finding the differences. Reconstructing the required result is left as an exercise for the reader:)

    #! perl -slw use strict; use Data::Dumper; use Algorithm::Diff qw[ diff ]; my $given = { 'Subtype' => [ { 'url' => 'http://www.google.nl/', 'title' => 'testNL' }, { 'url' => 'http://www.google.be/', 'title' => 'testBE' } ], 'name' => 'test1' }; my $retrieved = { 'Subtype' => [ { 'url' => 'http://www.google.nl/', 'title' => 'testNL' }, { 'url' => 'http://www.google.be/', 'title' => 'testBE' }, { 'url' => 'http://www.google.de/', 'title' => 'testBE' } ], 'name' => 'test2', 'type' => 'test2' }; =pod The result should be: $result = { 'Subtype' => [ { 'url' => 'http://www.google.de/', 'title' => 'testBE' } ], 'name' => 'test2', 'type' => 'test2' }; =cut print @$_ for map{ @$_ } diff( [ split "\n", Dumper( $retrieved ) ], [ split "\n", Dumper( $given ) ] ); __END__ 8:37:10.53 P:\test>344057.pl -9 }, -10 { -11 'url' => 'http://www.google.de/', -12 'title' => 'testBE' -15 'name' => 'test2', -16 'type' => 'test2' +11 'name' => 'test1'

    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
      ok i made it ... no idea why it didnt work in the first place. Thanks