in reply to RE: RE: Hash of hashes question
in thread Hash of hashes question

True. Of course, in this application, there appear to be a maximum of 24 possible entries...

Being curious, I did a quick benchmark. However, its results defy logic (my (sort grep keys)[-1] is always faster than a O(1) "max()" algorithm!)

Since I am obviously missing something, would you be willing to post some benchmarking code showing the efficiency differences between "sort" and "max()" in this context?

Russ

P.S. Just for the sake of helping me find my silliness, here is what I was running. Changing the map to larger numbers (to make the source hash bigger) made no difference to the benchmark times.

#!/usr/bin/perl -w use strict; use Benchmark; my $now = 8; # we'll pretend it's between 8 and 9 PM my %url = ( monday => { @{[map(($_,1), (1..1000))]} } ); timethese(100000, { a => q{ $now = (sort grep {$_ <= $now} keys %{$url{'monday'}})[-1]; }, b => q{ $now = ($now < $_ && $_ < 8 ? $_ : $now) for keys %{$url{'monday'} +}; } });
The results were the same even without doing the assignment each time in b (only assign when we should, rather than each time through the loop). I left it this way for no apparent reason...

Replies are listed 'Best First'.
RE: RE:(3) Hash of hashes question
by Aighearach (Initiate) on Jun 19, 2000 at 11:56 UTC

    I think it is the because sort() is very fast, and you are comparing it with an if/then (?:) control structure, which is inherantly slow.

    Well, another thing to do besides benchmarking is profiling:

    #!/usr/bin/perl use strict; use Devel::OpProf qw(profile print_stats zero_stats ); profile(1); # turn on profiling my $now = 8; # we'll pretend it's between 8 and 9 PM my %url = ( monday => { @{[map(($_,1), (1..1000))]} } ); #start profiling zero_stats; $now = (sort grep {$_ <= $now} keys %{$url{'monday'}})[-1]; print_stats; zero_stats; $now = ($now < $_ && $_ < 8 ? $_ : $now) for keys %{$url{'monday'}}; print_stats;

    Well, this tells us a lot!
    First algorithm
    null operation 1006
    block 1001
    private variable1001
    grep iterator 1000
    numeric le (<=) 1000
    scalar variable 1000
    pushmark7
    next statement5
    glob value 3
    constant item 2
    subroutine entry2
    private hash1
    private array 1
    sort1
    keys1
    conditional expression 1
    list slice1
    hash dereference 1
    hash element 1
    array dereference 1
    scalar assignment 1
    block entry 1
    grep 1
    Second algorithm
    private variable 3000
    null operation 2006
    logical and (&&) 2001
    glob value 1996
    numeric lt (<) 1992
    scalar dereference 1992
    next statement1006
    conditional expression 1001
    foreach loop iterator 1001
    scalar assignment 1000
    iteration finalizer 1000
    constant item 993
    pushmark 4
    subroutine entry2
    block 1
    keys 1
    foreach loop entry1
    array dereference 1
    block entry 1
    private array 1
    private hash1
    hash dereference1
    hash element 1
    loop exit 1

    So we see that just the control loop by iself takes almost as much "action" as the whole first algorithm

    Paris Sinclair    |    4a75737420416e6f74686572
    pariss@efn.org    |    205065726c204861636b6572
    I wear my Geek Code on my finger.
    
      use Devel::OpProf qw(profile print_stats zero_stats ); profile(1); # turn on profiling

      Wow! Devel::OpProf. My toolbox just got bigger...

      So, the follow-up question: is there a better way to code max() than the control structure I used? The theory is that sort() should be less efficient than max(). There must be a more efficient way to code max(), then. max() should be in O(1), where Perl's sort is (usually) in O(N log N). How can we get the best performance for the "greatest value less than another value" algorithm?

      Russ

        Okay, here is the max() given in Mastering Algorithms with Perl (O'Reilly):

        sub max { # Numbers. my $max = shift; foreach ( @_ ) { $max = $_ if $_ > $max } return $max; }

        So, modifying that to fit into our existing benchmark/profiling code, it looks like:

        foreach ( keys %{$url{'monday'}} ) { $now = $_ if $_ > $now }

        And then, the score:

         grep

        constant item2
        numeric le (<=)1000
        shift1
        sort1
        hash dereference2
        hash element1
        pushmark10
        grep1
        grep iterator1000
        scalar variable2001
        next statement6
        list slice1
        scalar assignment2
        keys1
        subroutine exit1
        block1000
        array dereference1
        glob value5
        private variable3
        subroutine entry4
         if/then/else

        constant item993
        foreach loop entry1
        shift1
        foreach loop iterator1001
        iteration finalizer1000
        hash dereference2
        hash element1
        pushmark7
        scalar variable4992
        next statement1007
        logical and (&&)2001
        scalar assignment1001
        conditional expression1000
        keys1
        subroutine exit1
        loop exit1
        array dereference1
        glob value6
        private variable3
        subroutine entry4
        numeric lt (<)1992
         book max()

        constant item1
        foreach loop entry1
        shift1
        foreach loop iterator1001
        iteration finalizer1000
        hash dereference2
        numeric gt (>)1000
        hash element1
        pushmark7
        scalar variable2292
        next statement2006
        logical and (&&)2001
        scalar assignment147
        keys1
        subroutine exit1
        loop exit1
        array dereference1
        glob value6
        private variable3
        subroutine entry4
        Benchmark: timing 10000 iterations of bench_grep, bench_if_then, bench_max...
        bench_grep: 115 wallclock secs (111.04 usr +  0.01 sys = 111.05 CPU) @ 90.05/s (n=10000)
        bench_if_then: 73 wallclock secs (70.02 usr +  0.01 sys = 70.03 CPU) @ 142.80/s (n=10000)
         bench_max: 64 wallclock secs (62.72 usr +  0.01 sys = 62.73 CPU) @ 159.41/s (n=10000)
        

        Well, we have some surprises here! grep comes in last, even though it has the least operations. As you can see, not all operations are equal... but the book technique is the big winner. Which tells me it was worth the $35US.

        The results also show that Devel::OpProf is a useful tool, but only as an accessory to Benchmark. On it's own, it can be misleading.

        Paris Sinclair    |    4a75737420416e6f74686572
        pariss@efn.org    |    205065726c204861636b6572
        I wear my Geek Code on my finger.