in reply to Common elements of a hash of hashes

The entries are in, let's see how they fare in the much awaited Benchmark competition. ;-)

#!/usr/bin/perl use strict; use warnings; use Benchmark qw(:all); use Storable qw(dclone); my %HoH = ( elt1 => { A => 'ccc', B => 'ccc', C => 'ccc', }, elt2 => { A => 'ccc', C => 'ccc', D => 'ccc', B => 'ccc', }, elt3 => { A => 'ccc', E => 'ccc', C => 'ccc', }, ); sub perl_mouse { my %HoH = %{ dclone(shift) }; my %count; my $count = keys %HoH; while (my ($name, $hash) = each %HoH) { while (my ($key) = each %$hash) { $count{$key}++; } } while (my ($name, $hash) = each %HoH) { while (my ($key) = each %$hash) { delete $$hash{$key} unless $count{$key} == $count; } } } sub skeeve1 { my %HoH = %{ dclone(shift) }; my %count; my $count = scalar keys %HoH; for ( [ grep { $count != $count{$_} } map { ++$count{$_}; $_ } map { keys %{$HoH{$_}} } keys %HoH ] ) { for my $hash (values %HoH) { delete (@$hash{@$_}); } } } sub skeeve2 { my %HoH = %{ dclone(shift) }; my %count; my $count = scalar keys %HoH; my $k = [grep {$count == ++$count{$_}} map {keys %{$HoH{$_}}} keys %HoH]; for my $h (values %HoH) { %$h = map { $_, $h->{$_}} @$k; } } sub blazar { my %HoH = %{ dclone(shift) }; my @keys = keys %HoH; my %saw; for (@keys) { $saw{lc,}++ for keys %{ $HoH{$_} }; } for (@keys) { my $h = $HoH{$_}; $saw{lc,} < @keys and delete $h->{$_} for keys %$h; } } my $benchmark = timethese(5_000, { Perl_Mouse => sub { perl_mouse(\%HoH) }, Skeeve1 => sub { skeeve1(\%HoH) }, Skeeve2 => sub { skeeve2(\%HoH) }, Blazar => sub { blazar(\%HoH) }, } ); print "\n"; cmpthese($benchmark);

Behold the results...

Benchmark: timing 5000 iterations of Blazar, Perl_Mouse, Skeeve1, Skeeve2...
    Blazar:  7 wallclock secs ( 3.61 usr +  1.36 sys =  4.97 CPU) @ 1006.04/s (n=5000)
Perl_Mouse:  7 wallclock secs ( 3.54 usr +  1.36 sys =  4.90 CPU) @ 1020.41/s (n=5000)
   Skeeve1:  7 wallclock secs ( 3.48 usr +  1.35 sys =  4.83 CPU) @ 1035.20/s (n=5000)
   Skeeve2:  8 wallclock secs ( 3.64 usr +  1.35 sys =  4.99 CPU) @ 1002.00/s (n=5000)

             Rate    Skeeve2     Blazar Perl_Mouse    Skeeve1
Skeeve2    1002/s         --        -0%        -2%        -3%
Blazar     1006/s         0%         --        -1%        -3%
Perl_Mouse 1020/s         2%         1%         --        -1%
Skeeve1    1035/s         3%         3%         1%         --

Update: modified the code and benchmark results, which were actually meaningless, thanks to ikegami's wise and knowledgeable advice. And as he rightly predicted, the difference between each method really isn't that huge.

Replies are listed 'Best First'.
Re^2: Common elements of a hash of hashes
by ikegami (Patriarch) on Oct 07, 2005 at 16:20 UTC

    Your results are garbage.

    1) The %HoH you pass as an argument is a global from some package (main?), not the my %HoH which is not visible to that code when its compiled by Benchmark. Instead of
    Perl_Mouse => 'perl_mouse(\%HoH)',
    use
    Perl_Mouse => sub { perl_mouse(\%HoH) },
    to avoid this problem.

    2) The first call of the first function will remove the uncommon elements. For every subsequent call to that function and to the other functions, %HoH will only contains keys common to all hashes. You need to deeply copy %HoH before passing it to the functions.

    Overall, I don't expect a major difference in speed between the different versions. Perl Mouse's will probably be the slowest (since it does all its looping in Perl, where as the others do it in C), but 1) it uses minimal memory, and more importantly, 2) it's the easiest to read. Using foreach values instead of while each will probably speed it up at the expense of memory.

      The first call of the first function will remove the uncommon elements. For every subsequent call to that function and to the other functions, %HoH will only contains keys common to all hashes. You need to deeply copy %HoH before passing it to the functions.

      Alright, I felt this would be an issue, but thought I had resolved it by passing a reference to each sub and then dereferencing it. Unfortunately, I missed the fact that the values of the hash were references, which would modify the original values in place... I plead guilty.

      To deeply copy the structure, I've just read about the dclone function from Storable.pm, I believe this is what's needed.

      I'll rewrite the small script in hope it can be more meaningful and useful than what I previously posted.

Re^2: Common elements of a hash of hashes
by blazar (Canon) on Oct 09, 2005 at 09:12 UTC
    Update: modified the code and benchmark results, which were actually meaningless, thanks to ikegami's wise and knowledgeable advice. And as he rightly predicted, the difference between each method really isn't that huge.
    Also note that my solution does something slightly different than the others. In fact I tried to adhere strictly to the specifications as of the wanted input and output examples given in the first post. Indeed you'll notice that the 'c' key appears both lowercase and uppercase: this seems probable to be just a typo, which is why others plainly ignored it; but since you could deal with it just fine with virtually no more effort, I just did it -- for instructive reasons.
Re^2: Common elements of a hash of hashes
by Anonymous Monk on Oct 07, 2005 at 14:57 UTC
    There is more than one way to do it...
    Thanks very much for your help (and also for the benchmarking). I did not think that my question would create such a competition ;)
    I will need some time to digest some piece of code before understand it, especially the solutions provided by skeeve.