in reply to RE: Hash of hashes question
in thread Hash of hashes question

Using "sort" for min() is expensive, but if you don't have more than 20 entries, it's probably not noticably bad.

-- Randal L. Schwartz, Perl hacker

Replies are listed 'Best First'.
RE:(3) Hash of hashes question
by Russ (Deacon) on Jun 19, 2000 at 06:04 UTC
    True. Of course, in this application, there appear to be a maximum of 24 possible entries...

    Being curious, I did a quick benchmark. However, its results defy logic (my (sort grep keys)[-1] is always faster than a O(1) "max()" algorithm!)

    Since I am obviously missing something, would you be willing to post some benchmarking code showing the efficiency differences between "sort" and "max()" in this context?

    Russ

    P.S. Just for the sake of helping me find my silliness, here is what I was running. Changing the map to larger numbers (to make the source hash bigger) made no difference to the benchmark times.

    #!/usr/bin/perl -w use strict; use Benchmark; my $now = 8; # we'll pretend it's between 8 and 9 PM my %url = ( monday => { @{[map(($_,1), (1..1000))]} } ); timethese(100000, { a => q{ $now = (sort grep {$_ <= $now} keys %{$url{'monday'}})[-1]; }, b => q{ $now = ($now < $_ && $_ < 8 ? $_ : $now) for keys %{$url{'monday'} +}; } });
    The results were the same even without doing the assignment each time in b (only assign when we should, rather than each time through the loop). I left it this way for no apparent reason...

      I think it is the because sort() is very fast, and you are comparing it with an if/then (?:) control structure, which is inherantly slow.

      Well, another thing to do besides benchmarking is profiling:

      #!/usr/bin/perl use strict; use Devel::OpProf qw(profile print_stats zero_stats ); profile(1); # turn on profiling my $now = 8; # we'll pretend it's between 8 and 9 PM my %url = ( monday => { @{[map(($_,1), (1..1000))]} } ); #start profiling zero_stats; $now = (sort grep {$_ <= $now} keys %{$url{'monday'}})[-1]; print_stats; zero_stats; $now = ($now < $_ && $_ < 8 ? $_ : $now) for keys %{$url{'monday'}}; print_stats;

      Well, this tells us a lot!
      First algorithm
      null operation 1006
      block 1001
      private variable1001
      grep iterator 1000
      numeric le (<=) 1000
      scalar variable 1000
      pushmark7
      next statement5
      glob value 3
      constant item 2
      subroutine entry2
      private hash1
      private array 1
      sort1
      keys1
      conditional expression 1
      list slice1
      hash dereference 1
      hash element 1
      array dereference 1
      scalar assignment 1
      block entry 1
      grep 1
      Second algorithm
      private variable 3000
      null operation 2006
      logical and (&&) 2001
      glob value 1996
      numeric lt (<) 1992
      scalar dereference 1992
      next statement1006
      conditional expression 1001
      foreach loop iterator 1001
      scalar assignment 1000
      iteration finalizer 1000
      constant item 993
      pushmark 4
      subroutine entry2
      block 1
      keys 1
      foreach loop entry1
      array dereference 1
      block entry 1
      private array 1
      private hash1
      hash dereference1
      hash element 1
      loop exit 1

      So we see that just the control loop by iself takes almost as much "action" as the whole first algorithm

      Paris Sinclair    |    4a75737420416e6f74686572
      pariss@efn.org    |    205065726c204861636b6572
      I wear my Geek Code on my finger.
      
        use Devel::OpProf qw(profile print_stats zero_stats ); profile(1); # turn on profiling

        Wow! Devel::OpProf. My toolbox just got bigger...

        So, the follow-up question: is there a better way to code max() than the control structure I used? The theory is that sort() should be less efficient than max(). There must be a more efficient way to code max(), then. max() should be in O(1), where Perl's sort is (usually) in O(N log N). How can we get the best performance for the "greatest value less than another value" algorithm?

        Russ