I have a requirement to compare two hashes, just the keys, not values. I came up with the following, but I was wondering (in the spirit of TIMTOWDI) whether there was a neater and/or more efficient way of doing it.
sub compare_hash_keys { my ($h1,$h2) = @_; my @k1 = sort keys %$h1; my @k2 = sort keys %$h2; return @k2 - @k1 if @k1 <=> @k2; for (0..$#k1) { return $k1[$_] cmp $k2[$_] if $k1[$_] cmp $k2[$_]; } 0; }
This could turn into a golf exercise I suppose.

Replies are listed 'Best First'.
Re: Comparing all keys in a hash
by gmax (Abbot) on Mar 20, 2003 at 12:03 UTC

    Actually, the way you are doing it, you are comparing arrays. Therefore you may find useful this node:

    comparing arrays

    The best method there was MeowChow's at (MeowChow) Re2: (Efficiently) comparing arrays

    To take advantage of hashes capabilities, though, I would rather use something like

    #!/usr/bin/perl -w use strict; my %first = ( aa => 1, bb => 2, cc => 3, ee => 6); my %second = ( bb=> 2, cc => 3, dd => 4 ); sub compare_one_hash_keys { my ($h1, $h2) = @_; return grep ( {not exists $h2->{$_} } keys %$h1) } sub compare_both_hash_keys { my ($h1, $h2) = @_; return grep ( {not exists $h2->{$_} } keys %$h1), grep ( {not exists $h1->{$_} } keys %$h2) ; } print compare_one_hash_keys( \%first, \%second), "\n"; print compare_one_hash_keys( \%second, \%first), "\n"; print compare_both_hash_keys( \%first, \%second), "\n"; __END__ prints: aaee dd aaeedd

    The first sub returns an array of keys present in one hash and missing in the other one. The second one combines the differences in both hashes. I think you may use this method to implement what is closer to your needs.

     _  _ _  _  
    (_|| | |(_|><
     _|   
    
Re: Comparing all keys in a hash
by diotalevi (Canon) on Mar 20, 2003 at 11:50 UTC

    A nit - don't sort the arrays until you've done your @k1 <=> @k2 test. If that returns true then you wasted two sort() ops. Also, do your cmp operation once and stash the result in a my() variable. Its fast, easy and saves redundant work.

    sub compare_hash_keys { my ($h1,$h2) = @_; my $test; # A temporary variable for holding a result so it doesn' +t have to be recomputed. my @k1 = keys %$h1; my @k2 = keys %$h2; $test = @k1 <=> @k2 and return $test; for ( 0 .. $#k1 ) { $test = $k1[$_] cmp $k2[$_] and return $test; } return; }
Re: Comparing all keys in a hash
by shotgunefx (Parson) on Mar 20, 2003 at 14:58 UTC
    Try this. Much MUCH faster (The compare_each_keys).
    #!/usr/bin/perl use strict; use warnings; use Benchmark qw(cmpthese timethese); # Randomly generate a hash. my $bench_count = 8000; my $kcount = 150; my ($h1,$h2); $h1->{random_string(2+int(rand() * 15)) }++ for (1..$kcount); $h2->{random_string(2+int(rand() * 15)) }++ for (1..$kcount); print "Benchmarking non-matching hashes\n"; my $results = timethese($bench_count ,{ each_cmp=> sub { compare_each_keys( +$h1,$h2) }, diotalevi_cmp=> sub { compare_diota +levi_keys($h1,$h2) }, iter_cmp=> sub { compare_hash_keys( +$h1,$h2) } } ); cmpthese($results); # Now test when there is a match print "Benchmarking matching hashes\n"; $results = timethese($bench_count ,{ each_cmp=> sub { compare_each_keys( +$h1,$h1) }, diotalevi_cmp=> sub { compare_diota +levi_keys($h1,$h1) }, iter_cmp=> sub { compare_hash_keys( +$h1,$h1) } } ); cmpthese($results); sub compare_each_keys { my ($h1,$h2) = @_; return 0 if keys %$h1 != keys %$h2; while (my $k = each %$h1){ return 0 if ! exists $h2->{$k}; } return 1; } sub compare_diotalevi_keys { my ($h1,$h2) = @_; my $test; # A temporary variable for holding a result so it doesn' +t have to be recomputed. my @k1 = keys %$h1; my @k2 = keys %$h2; $test = @k1 <=> @k2 and return $test; for ( 0 .. $#k1 ) { $test = $k1[$_] cmp $k2[$_] and return $test; } return; } sub compare_hash_keys { my ($h1,$h2) = @_; my @k1 = sort keys %$h1; my @k2 = sort keys %$h2; return @k2 - @k1 if @k1 <=> @k2; for (0..$#k1) { return $k1[$_] cmp $k2[$_] if $k1[$_] cmp $k2[$_]; } 0; } sub random_string { my $length = shift || 2; my @chars = ('a'..'z','A'..'Z',0..9); join ('',map{ $chars[ rand() * @chars ] } (1..$length)); } __END__ Benchmarking non-matching hashes Benchmark: timing 8000 iterations of diotalevi_cmp, each_cmp, iter_cmp +... diotalevi_cmp: 9 wallclock secs ( 8.22 usr + 0.00 sys = 8.22 CPU) @ + 973.24/s (n=8000) each_cmp: 0 wallclock secs ( 0.20 usr + 0.00 sys = 0.20 CPU) @ 40 +000.00/s (n=8000) (warning: too few iterations for a reliable count) iter_cmp: 18 wallclock secs (15.24 usr + 0.00 sys = 15.24 CPU) @ 52 +4.93/s (n=8000) Rate iter_cmp diotalevi_cmp each_cmp iter_cmp 525/s -- -46% -99% diotalevi_cmp 973/s 85% -- -98% each_cmp 40000/s 7520% 4010% -- Benchmarking matching hashes Benchmark: timing 8000 iterations of diotalevi_cmp, each_cmp, iter_cmp +... diotalevi_cmp: 20 wallclock secs (11.76 usr + 0.03 sys = 11.79 CPU) @ + 678.54/s (n=8000) each_cmp: 10 wallclock secs ( 9.63 usr + 0.00 sys = 9.63 CPU) @ 83 +0.74/s (n=8000) iter_cmp: 18 wallclock secs (18.04 usr + 0.00 sys = 18.04 CPU) @ 44 +3.46/s (n=8000) Rate iter_cmp diotalevi_cmp each_cmp iter_cmp 443/s -- -35% -47% diotalevi_cmp 679/s 53% -- -18% each_cmp 831/s 87% 22% --


    -Lee

    "To be civilized is to deny one's nature."
Re: Comparing all keys in a hash
by shotgunefx (Parson) on Mar 20, 2003 at 13:48 UTC
    UPDATE
    Much faster way in the node below .
    If your hashes aren't too large on average I wonder if something like this would be efficient.
    sub compare_keys { my ($h1,$h2); my ($k1,$k2) = ( join('',sort keys %$h1), join('',sort keys %$h2) ) +; return $k1 eq $k2 ? 1 : 0; }
    Of course I know you're not suppose to rely on hash order but would two hashes with the same keys return in the same order? I would think so but don't know (And don't have time to test) but if it did you could skip the sort. But it probably would be a bad idea.

    -Lee

    "To be civilized is to deny one's nature."
    UPDATE
    Fixed typo UPDATE
    Fixed typo, thanks Helter

      If your going to go for the concatenated keys route, then it would be better to use a separator rather than null in the joins to prevent {AA=>1, B=>2} comparing equal to {A=>1, AB=>2}.

      chr(28), the default value of $; is probably as good as any other. Usual caveats apply.

      #! perl -slw use strict; sub cmp_hashes {local $"=$;; #" "@{[ sort keys %{$_[0]} ]}" cmp "@{[ sort keys %{$_[1]} ]}" } my %hashA = ( A=>1, B=>2, C=>3, D=>4 ); my %hashB = ( A=>1, B=>2, C=>3, D=>4 ); my %hashC = ( A=>1, C=>3, D=>4 ); my %hashD = ( A=>1, AB=>2, C=>3, D=>4, E=>5 ); my %hashE = ( AA=>1, B=>2, C=>3, D=>4, E=>5 ); print "hashA cmp hashB =>", cmp_hashes \%hashA, \%hashB; print "hashA cmp hashC =>", cmp_hashes \%hashA, \%hashC; print "hashA cmp hashD =>", cmp_hashes \%hashA, \%hashD; print "hashA cmp hashE =>", cmp_hashes \%hashA, \%hashE; print "hashB cmp hashC =>", cmp_hashes \%hashB, \%hashC; print "hashB cmp hashD =>", cmp_hashes \%hashB, \%hashD; print "hashB cmp hashE =>", cmp_hashes \%hashB, \%hashE; print "hashC cmp hashD =>", cmp_hashes \%hashC, \%hashD; print "hashC cmp hashE =>", cmp_hashes \%hashC, \%hashE; print "hashD cmp hashE =>", cmp_hashes \%hashD, \%hashE; __END__ C:\test>244656 hashA cmp hashB =>0 hashA cmp hashC =>-1 hashA cmp hashD =>1 hashA cmp hashE =>-1 hashB cmp hashC =>-1 hashB cmp hashD =>1 hashB cmp hashE =>-1 hashC cmp hashD =>1 hashC cmp hashE =>-1 hashD cmp hashE =>-1

      Examine what is said, not who speaks.
      1) When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.
      2) The only way of discovering the limits of the possible is to venture a little way past them into the impossible
      3) Any sufficiently advanced technology is indistinguishable from magic.
      Arthur C. Clarke.
        slaps($self);
        Good catch. I actually did benchmark the two approaches and diotalevi's is around 25% faster on non matches and the string compare is about 14% faster when hashes match.

        -Lee

        "To be civilized is to deny one's nature."
      I had a few spare moments, and tried using your code shotgunefx. You have a curly brace at the end of your second line and I assume you mean a paren.

      Here's some test code to prove my point:
      #!/usr/bin/perl use strict; use warnings; my %h1; my %h2; my %h3; for( my $count = 0, my $char = 'a'; $char ne 'aaaa'; $count++, $char++ + ) { # print "Adding $char with value of $count to h1\n"; $h1{ $char } = $count; } foreach my $key ( keys %h1) { $h2{ $key } = $h1{$key}; } foreach my $key (reverse sort keys %h1) { $h3{ $key } = $h1{$key}; } #foreach my $key (sort keys %h1) { print "h1: key: $key value: $h1{ +$key}\n";} #foreach my $key (sort keys %h2) { print "h2: key: $key value: $h2{ +$key}\n";} #foreach my $key (sort keys %h3) { print "h3: key: $key value: $h2{ +$key}\n";} my ($k1,$k2) = ( join('',sort keys %h1), join('',sort keys %h2) ); my ($k3,$k4) = ( join('',keys %h1), join('',keys %h2) ); my ($k5,$k6) = ( join('',sort keys %h1), join('',sort keys %h3) ); my ($k7,$k8) = ( join('',keys %h1), join('',keys %h3) ); print "inserted same order, keys sorted compare: " . ($k1 eq $k2 ? 1 : + 0) . "\n"; print "inserted same order, keys not sorted compare: " . ($k3 eq $k4 ? + 1 : 0) . "\n"; print "inserted reverse order, keys sorted compare: " . ($k5 eq $k6 ? +1 : 0) . "\n"; print "inserted reverse order, keys not sorted compare: " . ($k7 eq $k +8 ? 1 : 0) . "\n"; __DATA__ Prints: inserted same order, keys sorted compare: 1 inserted same order, keys not sorted compare: 0 inserted reverse order, keys sorted compare: 1 inserted reverse order, keys not sorted compare: 0
      So to answer your question about sorting, the sort of the keys is necessary or the compare will not work.
      Of course I know you're not suppose to rely on hash order but would two hashes with the same keys return in the same order? I would think so but don't know (And don't have time to test) but if it did you could skip the sort. But it probably would be a bad idea.
      In the current Perl 5 implementation of hashes, you cannot rely of ordering of keys. You may get two canonically identical hashes, which should appear the same, but compare differently without a sort. If the elements were inserted in different sequences, you can see this.

      Try comparing { foo => 1, bar => 5 } with { bar => 5 , foo => 1 }.

      See also this thread

        Actually this does work for me. I think there is an compiler option SHARE_KEYS that I specified when building Perl that might have something to do with it.(I don't remember seeing it in previous version though)

        Regardless, as I originally stated, it's a bad idea to rely on this ordering.

        -Lee

        "To be civilized is to deny one's nature."

      You cannot rely on two hashes with the same keys being in the same order. While I'd guess that the key hash values are the same and (maybe) the same hash buckets - that says nothing about the internal order within a bucket. Any algorithm that depends on hash order is flawed unless you a) know exactly how hash order is implemented and b) are comfortable being tied to that specific implementation.

        I understand which is why I cautioned against it, though I thought if the SHARE_KEYS option was compiled it might work. Just playing Devil's Advocate.

        -Lee

        "To be civilized is to deny one's nature."
Re: Comparing all keys in a hash
by demerphq (Chancellor) on Mar 20, 2003 at 23:28 UTC

    I have to say that I really wonder about this node and thread. The code is interesting in its own way, and the discussion below has some merit I suppose, but I can't help but question a bunch of things.

    First off what does this routine do? It "compares" two hashes. Presumably the hashes are of numbers? Its a litte unclear if they are because sort keys %$h1 sorts them lexicographically. I think you want sort {$a <=> $b} keys %$h1.

    Next is the issue of what you return. You return the difference in the number of elements if they are different, otherwise you return which of the two has a lower element encoded ala <=>. What I wonder is why? Why is a hash that contains a single key, say 100000 smaller than a hash with two keys, say 1,2? Why is a hash with keys (0,1,100,200) smaller than a hash with (0,2,2,2)?

    If I saw this code in production without considerable explanation as to its use and relevance I would earmark it for a rewrite as being misconceived. Either it should be renamed, or it should be refactored, or it should be redone. It just doesnt make sense...


    ---
    demerphq

    <Elian> And I do take a kind of perverse pleasure in having an OO assembly language...
      demerphq, I feel I should put you out of your misery.

      This code is part of an application process monitoring GUI, that uses Tk. The hash keys are process IDs, and this sub is used to decide whether anything has changed (hence whether to backtick a ps and poll the application's control files - which are expensive activities).

      In practice, the code only needs a zero or non-zero value, so the spaceship operator like return value is overkill. After writing the code, it seemed like an interesting exercise to see if the code could have been written better, hence my meditation on PM.

      Hope this answers your questions. Your post has prompted me to add some comments to the code.

        Ok. Now I get it. :-) Personally I would return very different results...

        # Determines if two hashes have different keys # If the hashes are different: # In list context returns two array refs, each # containing the missing elements from the other # In scalar context returns an AOA, with a similar # content as above # Otherwise # Returns the empty list or undef as appropriate. # Thus # A true return means they hashes are different, # a false means they have identical keys sub hash_key_diff { my ($x,$y)=@_; my (@x,@y); foreach my $k (keys %$x) { push @y,$k unless exists $y->{$k}; } foreach my $k (keys %$y) { push @x,$k unless exists $x->{$k}; } return @x||@y ? wantarray ? (\@x,\@y) : [\@x,\@y] : () }

        This seems like a much more reasonable return to me, and considering the use you put it to it would seem to be more convenient as well: this would give you a list of processes that have been removed, as well as processes that have started.

        BTW: You didnt overlook my point about the sort order did you?

        Anyway, thanks for the explanation.


        ---
        demerphq


      Well I must say I did find this interesting, mainly because it seemed a fairly trivial problem with many seemingly equivalent solutions but was suprised how much of a difference in speed there was between the different approaches. (Though not likely to be an issue in most programs anyway) As far as my own code, I do find the return value odd but I was following the original.

      -Lee

      "To be civilized is to deny one's nature."
Re: Comparing all keys in a hash
by rir (Vicar) on Mar 21, 2003 at 20:54 UTC
    Is your
    return @k2 - @k2 if @k1 <=> @k2;
    an artifact of optimization? Or is this part of your definition of comparing the keys of two hashes?