quickness is not so obvious

Replies are listed 'Best First'.
Re: quickness is not so obvious by roboticus (Chancellor) on Jan 23, 2015 at 13:14 UTC
DanBev: If I wrote some code and found that it ran much slower than I expected, I first ask myself "What the heck did I do wrong?". I'll then look at the code to see if I did something dumb. If the code and algorithm look OK, I'll normally then check my assumptions and write a simple program to "look at" the data, but not do any processing, just to see what the fastest possible time could be. (So in this case, I'd have it simply open and read the 10,000 files, but not really look at the contents or do any writing.) Perhaps getting the data really does take that long. (It may be trawling through a remote filesystem with significant network latency, perhaps some of the data is archived on a tape system and it takes time for the robot to swap tapes to make the data available, etc. So if there's little time difference in simply reading the data and doing the processing, I'll ask the operations team about improving performance. If the difference is huge, then it's time to profile your code and find out where it's spending all its time. Sometimes you'll find yourself doing something silly (oops! I'm opening and scanning the contents of file X inside of a loop where I'm processing file Y line-by-line). Sometimes you might find a regex that's performing poorly for some odd data. Or you may find that the algorithm you used is even less performant than expected, and you need to use/invent a better one. As an example of the latter, I was surprised a few years ago when I found that the recursive method for computing fibonacci numbers was as slow as it is: `#!/usr/bin/env perl use strict; use warnings; use Time::HiRes qw( gettimeofday tv_interval ); for my $i (1 .. 100) { my $start = [gettimeofday]; my $fibonacci_number = fibo($i); my $interval = tv_interval( $start, [gettimeofday]); print "fib($i)=$fibonacci_number ($interval sec)\n"; } sub fibo { my $num = shift; return 1 if $num<3; return fibo($num-1) + fibo($num-2); }` [download] I knew that this method wasn't terribly efficient, but I was still amazed at how slow it actually was: $ perl fibo.pl fib(1)=1 (6e-06 sec) fib(2)=1 (4e-06 sec) fib(3)=2 (7e-06 sec) fib(4)=3 (6e-06 sec) fib(5)=5 (6e-06 sec) fib(6)=8 (1e-05 sec) fib(7)=13 (1.6e-05 sec) fib(8)=21 (2.4e-05 sec) fib(9)=34 (3.5e-05 sec) fib(10)=55 (5.6e-05 sec) fib(11)=89 (8.9e-05 sec) fib(12)=144 (0.000145 sec) fib(13)=233 (0.000291 sec) fib(14)=377 (0.000379 sec) fib(15)=610 (0.000609 sec) fib(16)=987 (0.000981 sec) fib(17)=1597 (0.001583 sec) fib(18)=2584 (0.002644 sec) fib(19)=4181 (0.00414 sec) fib(20)=6765 (0.005748 sec) fib(21)=10946 (0.004466 sec) fib(22)=17711 (0.007214 sec) fib(23)=28657 (0.011702 sec) fib(24)=46368 (0.018114 sec) fib(25)=75025 (0.027751 sec) fib(26)=121393 (0.045355 sec) fib(27)=196418 (0.072849 sec) fib(28)=317811 (0.117512 sec) fib(29)=514229 (0.193195 sec) fib(30)=832040 (0.307089 sec) fib(31)=1346269 (0.505429 sec) fib(32)=2178309 (0.796903 sec) fib(33)=3524578 (1.312573 sec) fib(34)=5702887 (2.093858 sec) fib(35)=9227465 (3.422112 sec) fib(36)=14930352 (5.594139 sec) fib(37)=24157817 (8.969398 sec) fib(38)=39088169 (14.46981 sec) fib(39)=63245986 (23.423682 sec) fib(40)=102334155 (38.201329 sec) fib(41)=165580141 (62.30559 sec) ^C [download] Yeah, like I'm going to sit through it computing the first 100 numbers that way. That's when I looked up Memoize and started using it. Which, by the way, is a very nice way to get a performance boost for some programs with little modification. I simply added the following two lines to the top of the program (just after the other 'use' statements): `use Memoize; memoize('fibo');` [download] To get much better performance: `$ perl fibo.pl fib(1)=1 (8e-06 sec) fib(2)=1 (5e-06 sec) fib(3)=2 (1.2e-05 sec) fib(4)=3 (7e-06 sec) fib(5)=5 (6e-06 sec) fib(6)=8 (6e-06 sec) fib(7)=13 (6e-06 sec) fib(8)=21 (6e-06 sec) fib(9)=34 (5e-06 sec) fib(10)=55 (6e-06 sec) . . . <snip> . . . fib(98)=1.35301852344707e+20 (6e-06 sec) fib(99)=2.18922995834555e+20 (6e-06 sec) fib(100)=3.54224848179262e+20 (6e-06 sec)` [download] Problem solved--Yeah, when you trawl through a deep binary tree twiceforeachlevel, it's gonna get slow....veryveryslow. Lesson learned, moving on to next thing.... ...roboticus When your only tool is a hammer, all problems look like your thumb.*	[reply] [d/l] [select]
Re^2: quickness is not so obvious by LanX (Saint) on Jan 23, 2015 at 13:34 UTC
As a side note: Fibonacci is one of these theoretical problems which need large computation where implementation matters. But in real life you can approximate the result with a certain accuracy. So why should I need more than 10 digits of fib (100) if I ever need fib(100) ? Cheers Rolf PS: Je suis Charlie!	[reply]
Re^2: quickness is not so obvious by DanBev (Scribe) on Jan 23, 2015 at 13:32 UTC
Thanks a lot for your reply. Without going into too much detail about algorithms or optimizers, the reflection was precisely about the subtle world to do things, do things right and make things better. In detail, in my case, from a list of files (I did not need to open them) what occupy time were obviously the queries to a database on another server. Cheer	[reply]
Re^2: quickness is not so obvious by SimonPratt (Friar) on Jan 23, 2015 at 16:00 UTC
Hmm, my personal feeling is that if you use `Memoize` and see a dramatic performance boost when using recursive functions like this, then you really need to rethink your algorithm as there is very likely a much better solution (To be fair, I suspect this is not always true, however it is absolutely worth investigating, especially if the trade off in memory utilisation is particularly expensive)	[reply] [d/l]
Re^3: quickness is not so obvious (Dynamic Programming) by LanX (Saint) on Jan 23, 2015 at 23:03 UTC
> then you really need to rethink your algorithm as there is very likely a much better solution I doubt this, I've done some branch-and-bound implementations in the past which searched recursively through a graph and it was crucial for speed to see if a partial solution was already calculated. Some graph search algorithms avoid memoizing by ordering the input nodes (thus avoiding duplications) In aforementioned cases this wasn't possible at all! Cheers Rolf PS: Je suis Charlie! update for instance see wikipedia Dynamic_programming The dynamic programming approach seeks to solve each subproblem only once, thus reducing the number of computations: once the solution to a given subproblem has been computed, it is stored or "memo-ized": the next time the same solution is needed, it is simply looked up. This approach is especially useful when the number of repeating subproblems grows exponentially as a function of the size of the input.	[reply]
Re^3: quickness is not so obvious by GrandFather (Saint) on Jan 24, 2015 at 22:15 UTC
How do you justify spending even five minutes thinking about how to improve the algorithm when you can solve the issue in 30 seconds and two lines of code? Of course after you've sent 30 seconds and found that the problem isn't solved, then it's time to think about smarter ways to approach the problem. I know where you are coming from. Tricks like Memoize feel like cheating. But if it gets the job done and keeps getting it done then it's probably the right solution. Perl is the programming world's equivalent of English	[reply]
Re^4: quickness is not so obvious by SimonPratt (Friar) on Jan 26, 2015 at 10:57 UTC
Re^2: quickness is not so obvious by Laurent_R (Canon) on Jan 24, 2015 at 16:04 UTC
You could also have cached the calculated data yourself. Something like this several-liner: `$ time perl -e ' print fibo(99); { my @cache; sub fibo { my $num = shift; unless (defined $cache[$num]) { if ($num < 3) { $cache[$num] = 1; } else { $cache[$num] = fibo($num-1) + fibo($num-2); } } return $cache[$num]; } } ' 2.18922995834555e+20 real 0m0.036s user 0m0.000s sys 0m0.031s` [download] Je suis Charlie.	[reply] [d/l]
Re^3: quickness is not so obvious by roboticus (Chancellor) on Jan 24, 2015 at 17:27 UTC
Laurent_R: Yeah, I know. I used the recursive method on purpose just 'cause it was easy and I was demonstrating recursion to a beginner. (If I recall correctly, I used the fibonacci series because it was even uglier than the traditional factorial example of recursion.) If I was actually interested in the fibonacci numbers (I don't recall actually ever needing them for anything), I'd've just created them in a loop, something like: `my @vals = (1, 1); . . . sub fibo { my $num = (shift)-1; while ($#vals < $num) { push @vals, $vals[-1] + $vals[-2]; } return $vals[$num]; }` [download] ...roboticus When your only tool is a hammer, all problems look like your thumb.	[reply] [d/l]
Re^4: quickness is not so obvious by Laurent_R (Canon) on Jan 24, 2015 at 22:32 UTC
Re: quickness is not so obvious by locked_user sundialsvc4 (Abbot) on Jan 22, 2015 at 15:54 UTC
`Don’t “diddle” code to make it faster. Find a better algorithm.` Kernighan & Plauger: The Elements of Programming Style Proof-positive that great wisdom sometimes comes in tiny books.
Re^2: quickness is not so obvious by RonW (Parson) on Jan 22, 2015 at 18:29 UTC
Sometimes, implementation details do make a difference. Even with today's advanced optimizing compilers. And in the embedded systems world, where US$ 0.50 processors rule, the best available compilers often do not have the most advanced optimizers. Example (in C): `i = count; j = 0; do { // something with j j++; } while (--i);` [download] is often faster than the common: `for (j = 0; j < count; j++) { }` or even: `for (i = count, j = 0; i; --i, j++) { }` This is because it's easy for an optimizer to "see" it can safely "replace" `while (--i)` with a decrement-and-branch-if-not-zero instruction. (Also, if your code functions equally correctly with a decrementing "index", you can skip the extra variable and associated increment.)	[reply] [d/l] [select]
Re^3: quickness is not so obvious by GrandFather (Saint) on Jan 23, 2015 at 11:09 UTC
If you are worried about dumb compilers why do you use post increment operators where the code logic doesn't care if it's a pre or post operator? Post "requires" an intermediate variable to save the pre-incremented value so it can be returned post increment. This preference for post increment is pervasive and I can't think why that should be. To me pre-increment is king because it puts the operator out front where I can see it instead of hiding the operator behind a variable. I expect most modern compilers would optimise the code if the "return" value's not used in any case, but in an embedded context where the odd nano-second may be important using post-increment by default seems really odd if you don't trust the compiler. Perl is the programming world's equivalent of English	[reply]
Re^4: quickness is not so obvious by RonW (Parson) on Jan 23, 2015 at 18:04 UTC
Re^5: quickness is not so obvious by GrandFather (Saint) on Jan 24, 2015 at 05:44 UTC
Some notes below your chosen depth have not been shown here
Re^2: quickness is not so obvious by Your Mother (Archbishop) on Jan 22, 2015 at 23:22 UTC
Echoing what RonW said, I think that particular reasoning is from a pre-high-level language mindset where there aren�t all that many ways to write a piece of code and they will often collapse to a similar number of ops when compiled. There are so many ways to write a line or block or package of Perl that diddling is harder to define� but you should know it when you see it. Apologies to the ghost of Justice Stewart.	[reply]
Re: quickness is not so obvious by Laurent_R (Canon) on Jan 24, 2015 at 22:55 UTC
I fully agree with your post. And I am quite often pis*ed off by posts here in the Monastery in which, facing a performance problem, some monk says: "Use a database". Wrong answer most of the time. A hash can be hundreds or even thousands of times faster than a DB. I am dealing almost daily with various databases. The first thing I do, quite often, is to load a number of tables (especially the reference data) into hashes. Je suis Charlie.	[reply]

update