Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello,
I am facing some problem to find a fast algorithm to keep only the common elements of a hash of hashes.

If we consider this kind of structure (the value "ccc" is not important).
%HoH = ( elt1 => { A => "ccc", B => "ccc", c => "ccc", }, elt2 => { A => "ccc", C => "ccc", D => "ccc", B => "ccc", }, elt3 => { A => "ccc", E => "ccc", C => "ccc", }, );
after treatment, the result would be:
%HoH = ( elt1 => { A => "ccc", c => "ccc", }, elt2 => { A => "ccc", C => "ccc", }, elt3 => { A => "ccc", C => "ccc", }, );
I don't think that is very complex but for the moment I have to do too many loop to obtain this results. I am not very good at algorithm

If someone could help me, it would be nice. Thanks.

Replies are listed 'Best First'.
Re: Common elements of a hash of hashes
by blazar (Canon) on Oct 07, 2005 at 09:31 UTC
    #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my %hoh = ( elt1 => { A => "ccc", B => "ccc", c => "ccc", }, elt2 => { A => "ccc", C => "ccc", D => "ccc", B => "ccc", }, elt3 => { A => "ccc", E => "ccc", C => "ccc", }, ); my @keys=keys %hoh; my %saw; for (@keys) { $saw{lc,}++ for keys %{ $hoh{$_} }; } for (@keys) { my $h=$hoh{$_}; $saw{lc,}<@keys and delete $h->{$_} for keys %$h; } print Dumper \%hoh; __END__
Re: Common elements of a hash of hashes
by Perl Mouse (Chaplain) on Oct 07, 2005 at 08:56 UTC
    #!/usr/bin/perl use strict; use warnings; my %HoH = ( elt1 => { A => "ccc", B => "ccc", C => "ccc", }, elt2 => { A => "ccc", C => "ccc", D => "ccc", B => "ccc", }, elt3 => { A => "ccc", E => "ccc", C => "ccc", }, ); my %count; my $count = keys %HoH; while (my ($name, $hash) = each %HoH) { while (my ($key) = each %$hash) { $count{$key}++; } } while (my ($name, $hash) = each %HoH) { while (my ($key) = each %$hash) { delete $$hash{$key} unless $count{$key} == $count; } } use Data::Dumper; print Dumper \%HoH; __END__ $VAR1 = { 'elt2' => { 'A' => 'ccc', 'C' => 'ccc' }, 'elt3' => { 'A' => 'ccc', 'C' => 'ccc' }, 'elt1' => { 'A' => 'ccc', 'C' => 'ccc' } };
    Perl --((8:>*

      Almost exactly what I wrote...

      How do you like this version? ;-)
      use strict; use warnings; my %HoH = ( elt1 => { A => "ccc", B => "ccc", C => "ccc", }, elt2 => { A => "ccc", C => "ccc", D => "ccc", B => "ccc", }, elt3 => { A => "ccc", E => "ccc", C => "ccc", }, ); my %count; my $count= scalar keys %HoH; for ( [ grep { $count!=$count{$_} } map { ++$count{$_};$_ } map { keys + %{$HoH{$_}} } keys %HoH ] ) { foreach my $hash (values %HoH) { delete(@$hash{@$_}); } } use Data::Dumper; print Dumper(\%HoH);

      $\=~s;s*.*;q^|D9JYJ^^qq^\//\\\///^;ex;print

        How do you like this version?

        Not obscure enough for obfuscated code, and too obscure for normal code. I never liked for (SCALAR) {... $_ ...} as an alternative for {my $var = SCALAR; ... $var ...}.

        Perl --((8:>*
Re: Common elements of a hash of hashes
by Skeeve (Parson) on Oct 07, 2005 at 10:06 UTC

    Just to get a bit more obscure...

    #!/usr/bin/perl use strict; use warnings; my %HoH = ( elt1 => { A => "ccc", B => "ccc", C => "ccc", }, elt2 => { A => "ccc", C => "ccc", D => "ccc", B => "ccc", }, elt3 => { A => "ccc", E => "ccc", C => "ccc", }, ); my %count; my $count= scalar keys %HoH; my $k= [grep {$count==++$count{$_}} map {keys %{$HoH{$_}}} keys %HoH]; foreach my $h (values %HoH) { %$h= map {$_,$h->{$_}} @$k; } use Data::Dumper; print Dumper(\%HoH);

    $\=~s;s*.*;q^|D9JYJ^^qq^\//\\\///^;ex;print
Re: Common elements of a hash of hashes
by Fang (Pilgrim) on Oct 07, 2005 at 10:45 UTC

    The entries are in, let's see how they fare in the much awaited Benchmark competition. ;-)

    Behold the results...

    Benchmark: timing 5000 iterations of Blazar, Perl_Mouse, Skeeve1, Skeeve2...
        Blazar:  7 wallclock secs ( 3.61 usr +  1.36 sys =  4.97 CPU) @ 1006.04/s (n=5000)
    Perl_Mouse:  7 wallclock secs ( 3.54 usr +  1.36 sys =  4.90 CPU) @ 1020.41/s (n=5000)
       Skeeve1:  7 wallclock secs ( 3.48 usr +  1.35 sys =  4.83 CPU) @ 1035.20/s (n=5000)
       Skeeve2:  8 wallclock secs ( 3.64 usr +  1.35 sys =  4.99 CPU) @ 1002.00/s (n=5000)
    
                 Rate    Skeeve2     Blazar Perl_Mouse    Skeeve1
    Skeeve2    1002/s         --        -0%        -2%        -3%
    Blazar     1006/s         0%         --        -1%        -3%
    Perl_Mouse 1020/s         2%         1%         --        -1%
    Skeeve1    1035/s         3%         3%         1%         --
    

    Update: modified the code and benchmark results, which were actually meaningless, thanks to ikegami's wise and knowledgeable advice. And as he rightly predicted, the difference between each method really isn't that huge.

      Your results are garbage.

      1) The %HoH you pass as an argument is a global from some package (main?), not the my %HoH which is not visible to that code when its compiled by Benchmark. Instead of
      Perl_Mouse => 'perl_mouse(\%HoH)',
      use
      Perl_Mouse => sub { perl_mouse(\%HoH) },
      to avoid this problem.

      2) The first call of the first function will remove the uncommon elements. For every subsequent call to that function and to the other functions, %HoH will only contains keys common to all hashes. You need to deeply copy %HoH before passing it to the functions.

      Overall, I don't expect a major difference in speed between the different versions. Perl Mouse's will probably be the slowest (since it does all its looping in Perl, where as the others do it in C), but 1) it uses minimal memory, and more importantly, 2) it's the easiest to read. Using foreach values instead of while each will probably speed it up at the expense of memory.

        The first call of the first function will remove the uncommon elements. For every subsequent call to that function and to the other functions, %HoH will only contains keys common to all hashes. You need to deeply copy %HoH before passing it to the functions.

        Alright, I felt this would be an issue, but thought I had resolved it by passing a reference to each sub and then dereferencing it. Unfortunately, I missed the fact that the values of the hash were references, which would modify the original values in place... I plead guilty.

        To deeply copy the structure, I've just read about the dclone function from Storable.pm, I believe this is what's needed.

        I'll rewrite the small script in hope it can be more meaningful and useful than what I previously posted.

      Update: modified the code and benchmark results, which were actually meaningless, thanks to ikegami's wise and knowledgeable advice. And as he rightly predicted, the difference between each method really isn't that huge.
      Also note that my solution does something slightly different than the others. In fact I tried to adhere strictly to the specifications as of the wanted input and output examples given in the first post. Indeed you'll notice that the 'c' key appears both lowercase and uppercase: this seems probable to be just a typo, which is why others plainly ignored it; but since you could deal with it just fine with virtually no more effort, I just did it -- for instructive reasons.
      There is more than one way to do it...
      Thanks very much for your help (and also for the benchmarking). I did not think that my question would create such a competition ;)
      I will need some time to digest some piece of code before understand it, especially the solutions provided by skeeve.
Re: Common elements of a hash of hashes
by Skeeve (Parson) on Oct 07, 2005 at 08:56 UTC
    Here is the idea for an algorithm for that:
    1. count all keys into a new hash
    2. Delete all those entries from the new hash where the count is equal to the number of "eltX" elements
    3. What remains in the new hash are the keys that are to be delete

    $\=~s;s*.*;q^|D9JYJ^^qq^\//\\\///^;ex;print
Re: Common elements of a hash of hashes
by Anonymous Monk on Oct 07, 2005 at 16:52 UTC
    Sorry but some syntaxic piece of code are really obscur. i am not as comfortable as you with perl. do you mind giving me a short explanation on the elements hereunder?
    sub perl_mouse { my %HoH = %{+shift}; # I know @_ to get the parameter but not this + one } $saw{lc,} # what is lc,? grep { $count!=$count{$_} } map { ++$count{$_};$_ } map { keys %{$HoH{ +$_}} } keys %HoH # the result of this line is stored in which variable?
    thanks
      my %HoH = %{+shift};

      Did I miss something? I don't see the word "shift" in the entire thread. Anyway, that takes the first parameter, dereferences it as a hashref and assigns the resulting hash to %HoH. The plus sign is needed for preventing Perl to take $shift as a reference and instead use the shift function. shift without arguments operates on @_ (UPDATE: or @ARGV depending on the scope).

      lc is "lowercase". I'm not sure about the comma.

      grep { $count!=$count{$_} } map { ++$count{$_};$_ } map { keys %{$HoH{$_}} } keys %HoH
      the result of this line is stored in which variable?

      That returns a list but, since it's surrounded with [ and ], the for loop gets a reference to that list. It's not stored anywhere. It's visible inside the loop as $_.

      I see it the other way around, that is, s/obscure/brilliant/.

      --
      David Serrano

        lc is "lowercase". I'm not sure about the comma.
        It's there because of this:
        $ cat foo.pl my %a; $_='fOo'; $a{lc}++; $a{lc()}++; $a{+lc}++; $a{lc,}++; $ perl -MO=Deparse foo.pl my %a; $_ = 'fOo'; ++$a{'lc'}; ++$a{lc $_}; ++$a{lc $_}; ++$a{lc $_}; foo.pl syntax OK
        So it's there for disambiguating too, just as +shift. I chose the latter form because in this context (involving the ++ operator) it seemed to me to be the least obtrusive and visually distracting.
      Sorry but some syntaxic piece of code are really obscur. i am not as comfortable as you with perl. do you mind giving me a short explanation on the elements hereunder?
      sub perl_mouse { my %HoH = %{+shift}; # I know @_ to get the parameter but not this + one } $saw{lc,} # what is lc,?
      As a general rule if you see a bareword like those, chances are that they may be built in functions. So, always as a general rule, you'll find that perl ('s excellent documentation) may already give you an answer; just check e.g. (in this case): But since there's a subtle issue involved here, please read also the following replies, including mine to Hue-Bond's.
        Thanks for your accurate explanation. I knew the functions shift and lc. In fact, the tricky points were the comma and the plus.

        I have the habit to read the documentation but I don't see very often this kind of syntax in it. to imagine how works the array reference of a grep followed by 2 map in a loop is not easy for everybody I guess.

        but thank to the monks, everybody can learn.

        I see it the other way around, that is, s/obscure/brilliant/.
        I would say s/obscure/brilliant_but_obuscure/ ;