in reply to how to speed up program dealing with large numbers?

Change your fib sub to the following. You'll be amazed at the speedup you get:

my %cache; sub fib { my $n = shift; return $n if $n < 2; return ( $cache{ $n - 1 } ||= fib($n - 1) ) + ( $cache{ $n - 2 } ||= fib($n - 2) ); }

With your fib(), 1 .. 50 takes 10 100 minutes and counting. With the above, 1 .. 100 takes less than 1 millisecond.

Once you're done being amazed, see Memoize for the explanation of what it does and why it works.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"I'd rather go naked than blow up my ass"

Replies are listed 'Best First'.
Re^2: how to speed up program dealing with large numbers?
by ikegami (Patriarch) on Mar 22, 2010 at 00:53 UTC

    An array would be better suited than a hash since we're dealing with non-negative integer indexes.

    my @fib = ( 0, 1 ); sub fib { my $n = shift; return ( $fib[ $n - 1 ] ||= fib($n - 1) ) + ( $fib[ $n - 2 ] ||= fib($n - 2) ); }

    Sub calls are expensive in Perl, and there's no need for recursion here, so let's eliminate them:

    my @fib = ( 0, 1 ); sub fib { my $n = shift; $fib[$_] = $fib[$_-2] + $fib[$_-1] for @fib..$n; return $fib[$n]; }
      An array would be better suited than a hash since we're dealing with non-negative integer indexes.

      Even in this specific case, the difference in either performance or memory is so marginal as to be essentially unmeasurable.

      For 1..1000, both take around 1.3 seconds and negligable amounts of memory. Where the original implementation would take months if not years longer than the universe has existed!

      But using a hash for caching makes far more sense because of its generality.

      Giving up that generality to move from being 15,000,000 1e+196 times faster to being 15,000,001 1e+196 + 1 times faster is simply pointless. To put that into perspective, it means your changes on top of mine will make a difference of approximately:

      0.00000000000000000000000000000000000000000000000000000000000000000000 +000000000000000000000000000000000000000000000000000000000000000000000 +000000000000000000000000000000000000000000000000000000000999999999999 +999999999999999999999999999999999999999999999999999999999999999999999 +999999999999999999999999999999999999999999999999999999999999999999999 +999999999999999999999999999999999999999999999900000000000000000000000 +000000000000000000000000000000000000000000000000000000000000000000000 +000000000000000000000000000000000000000000000000000000000000000000000 +000000000000000000000000000000000009999999999999999999999999999999999 +999999999999999999999999999999999999999999999999999999999999999999999 +999999999999999999999999999999999999999999999999999999999999999999999 +999999999999999999999999000000000000000000000000000000000000000000000 +000000000000000000000000000000000000000000000000000000000000000000000 +000000000000000000000000000000000000000000000000000000000000000000000 +0000000000001%

      Worth the effort?

      Sub calls are expensive in Perl, and there's no need for recursion here, so let's eliminate them:

      Again, an utterly pointless exercise.

      When performing the range 1 .. 1000, with the cached version the fib function is called 3005 times. That's twice for each iteration from the main loop, leaving 1 recursion per iteration. Compare that to the original implementation that would call fib() ~1e200 times. That's:

      100,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,00 +0,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000 +,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000, +000,000,000,000,000,000,000,000,000,000,000,000,000,000,000

      So any additional savings are so utterly marginal as to be completely unmeasurable.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        Absolutely no doubt that using the hash is a million time better than my original implementation. In my original if you started at the 20th value and just did a range of 5 it would take forever. With the hash I've barely been able to get it to hang up more than a second or two, and thats with a range of 100.

        Worth the effort?

        What effort? I would have used an array from the start, meaning it take extra effort to use the hash version. Besides, we're talking of a 5 character difference.

        Compare that to the original implementation

        Why? When benchmarking video cards, you don't compare them to a CGA video card, you compare them to each other.

        Just because you've made your algorithm 1e200 times faster does not mean that you can't improve it any further. It just means you are in a whole new performance scope.

        The other week, I moved from an O(n^2) to an O(n) algorithm and saved an arbitrarily large amount of time.
        (On the old "patience" sample data, it was a 30,000% speedup after the extra overhead costs. On the new "quick" sample data it would have been a 120,000% improvement. There is no sample big enough to qualify as "patience" anymore.)
        Then the next week, I made another 2x improvement.

        If you think about it as running in 0.0004 time vs 0.0008 time, you're playing to the pessimist in you with math and psychology tricks.
        That extra 2x improvement was still well worth doing!

Re^2: how to speed up program dealing with large numbers?
by Solarplight (Sexton) on Mar 21, 2010 at 23:42 UTC

    Awsome, oh yeah, I am amazed!! Huge difference!! Huge, huge, huge difference

    I still need to read about Memoize more thoroughly to fully understand it I think. It looks as though it could certainly be a useful tool. As it pertains to my program though, the concept is geniously simple, put the values in a hash instead of recalculating them a bunch of times, freakin great.