Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Which way is faster?

by dws (Chancellor)
on Sep 09, 2002 at 06:16 UTC ( [id://196176]=perlmeditation: print w/replies, xml ) Need Help??

Many years ago, when it was common for companies to send programmers off to workshops, a friend attended one for PL/1 programmers. During this workshop, people were grouped into teams, and given a problem to solve. One of the teams got off onto a discussion about the most efficient way to code a particular type of loop. (In PL/1, like Perl, there is often more than one way to do something.)

The discussion got rather heated, with lots of handwaving and citing of expert opinion. When the workshop broke for the evening, the argument continued in the hotel bar. A few people were heard discussing the merits of alternate loop strategies the next morning at breakfast.

By this point, the other teams were curious. For an argument this big, there must be something important going on -- some key technical learning to take home and impress the boss with. So, when the "most efficient loop" team presented their barely completed solution, everyone paid a lot of attention.

Their solution used the loop type they'd been arguing about exactly once, in initialization code. For all of the hours, and extra hours, they'd spent arguing about efficiency, the effect on the performance of their (barely completed) solution was negligible. The other teams hadn't gotten sidetracked, and all produced more elegant solutions.


This was many years ago. Since then, technology has advanced considerably. It's a lot easier now to prototype benchmark alternatives. Human nature, though, hasn't changed, and the lesson from this workshop of many years ago is just as true now as it was then:

Before you spend a lot of time and effort arguing over performance, make sure it matters.

Replies are listed 'Best First'.
Structure is more important than speed
by Brutha (Friar) on Sep 09, 2002 at 08:29 UTC
    I agree with the point profile before optimize. I would even go further:

    1. 98.765% of all programming task do not need a special focus on the fastest code.

    2. Most times the compiler does a better optimization job than you. (What does the Perl interpreter?)

    3. It is more important to spend time on a clean program structure to enable support.

    Ad 1)
    What type of programs or script does the normal human write? Does it really matter if a script finishes after 5 minutes or 4min55 ? If an optimization gives 3m to a program, it was formerly bad designed and should not be optimized, but restructured. Maybe GUI-callbacks to verify an entry? If you check every field against a database it will be slow, but as above, redesign not optimization is necessary. The rest are special cases, like games, os-kernels, often/many running scripts, Perl interpreters etc.

    Ad 2)
    Since the time I analyzed assembler output of a C compiler to find bugs in the RT-Library, I leave most things to the compiler. It made no difference wether to dereference a pointer over some levels or use a local variable and one line of code more, both resulted in the same code using CPU registers. (Any knowledge, what Perl does?) But the second version was readable, this leads to my next point!

    ad 3)
    Optimized code is most times only readable with the help of the brains who invented that. What if he is not available (well, my experience is that it is most times he, not she) or can't remeber what he thought 6 months ago? Invest weeks to fix a bug? Throw away the code and rewrite it? What estimation do you give your customer, what to your boss about your working results? For some microseconds to rotate a logfile? If you write clean code, well structured constructs (loops) the code-flow becomes understandable and is therefore easier to maintain, which includes maybe optimization.
    Oh, I forgot. Use strict, it is most times more important than your optimization, which work only without strict.

    regards Brutha

    Confession:
    I named Basic Variables 'a1', 'a2' to let the interpreter find them faster. That was about 20 years ago on a apple][

      The Perl interpreter is about as stupid as you can get.

      Optimization is done at compile time, and compile time is seen as run time, so little optimization is done. Shorter variable names and hash keys are faster. If you throw away warnings you can double speed with

      if (defined(my $foo = exists $lookup{$bar})) { .... }
      instead of
      if (exists $lookup{$bar}) { my $foo = $lookup{$bar}; ... }
      In looping through nested data structures you gain speed by pulling out substructures once rather than doing nested lookups. Bonus: this may be clearer.
      for my $foo (sort keys %some_hash) { my $by_foo = $some_hash{$foo}; for my $bar (sort keys %$by_foo) { my $by_bar = $by_foo->{$bar}; for my $baz (sort keys %$by_bar) { ... } } }
      You can travel synchronized data structures in parallel rather than traversing one and doing hash lookups to find the other.

      Good news! Using strict encourages fast lexicals.

      Yes, I have needed to know this. A report took 3 hours per period, and needed to take less to finish the total job each night. There is no point in running more CPU-bound processes than one has CPUs. Moore did not arrive in time. So the code ran a half-dozen times faster with limited damage done. I did not rewrite in C.

        If you throw away warnings you can double speed with
        if (defined(my $foo = exists $lookup{$bar})) { .... }
        instead of
        if (exists $lookup{$bar}) { my $foo = $lookup{$bar}; ... }

        Maybe you can, but it hardly matters as they do two entirely different things. The first sets $foo to either 1 or "" (either of which is always defined.)

        -sauoq
        "My two cents aren't worth a dime.";
        

      I totally agree with you, Brutha. Maybe that's because I never had an imperative need to write a strictly optimized code. Instead, I prefer clarity and I always use several lines, e.g.:

      my $copy = $main_variable ; $copy =~ s/my name/dumb me/g ;

      even when one line would easily do. As long as you don't need to suck every cycle out of your (maybe rather old) CPU, it's better to be slower and readable. Especially when your code is likely to survive when you leave your job :-) IMHO

      Ciao!
      --bronto

      # Another Perl edition of a song:
      # The End, by The Beatles
      END {
        $you->take($love) eq $you->made($love) ;
      }

Re: Which way is faster?
by Zaxo (Archbishop) on Sep 09, 2002 at 08:18 UTC

    Very good topic, there have been several nodes concerned with speed this weekend.

    I'll add that, if you must rework code for speed, Devel::DProf is available to generate data on where you are spending your time.

    That is not to say you should write inefficient code. It is worthwhile to develop the habit and skill of writing code which only does what it must. You can obtain overall speedier code that way without the expense of reworking, or the frustration of optimizing the wrong thing. I hope dws's colleagues were simply taking the opportunity to exercise some of those skills.

    After Compline,
    Zaxo

Re: Which way is faster?
by FoxtrotUniform (Prior) on Sep 09, 2002 at 06:51 UTC

    I recall a Game Developer article about the development of Quake, where John Cash cites a tightly written inner loop in the code... to clear the screen in non-textured mode (which nobody ever really uses, and contributes a minuscule amount of performance even then). By Cash's estimate, this took (Michael Abrash, the original Intel assembly optimization god) a couple days to write.

    Profile before you optimize.

    --
    F o x t r o t U n i f o r m
    Found a typo in this node? /msg me
    The hell with paco, vote for Erudil!

      Oddly enough, the first time I had heard of Knuth's comment that premature optimization was the root of all evil was from John Carmack on a BBS in the early 90's.

      -Lee

      "To be civilized is to deny one's nature."
Re: Which way is faster?
by enoch (Chaplain) on Sep 09, 2002 at 08:24 UTC
    Come on; admit it. You've done it. We all claim to not have, and we all go on and on about the merits, benfits, pros/cons, etc. of optimization. But, sometimes, it just darned fun to do.

    Granted, in your above example, it was more foolish than functional. And, we all know the Knuth quote. But, we have all probably engaged in it in some form of another.

    I think of it like golf -- not something I would ever dream of putting into production code, but fun to do when no one is "watching." Plus, it satisifies, on some levels, curiousity and our desire to judge ourselves against our peers.

    Don't get me wrong, I am not advocating off the wall optimization techniques. I just thought someone should point out the fun in playing optimization speed games.

    Enoch

Re: Which way is faster?
by atcroft (Abbot) on Sep 09, 2002 at 06:45 UTC

    Sounds like premature optimization struck again. Quite a few articles appear in the Super Search when looking for that phrase, and real-world examples hopefully will help nail the lesson home. Thank you for sharing.

Re: Which way is faster?
by rinceWind (Monsignor) on Sep 09, 2002 at 13:09 UTC
    One important point is to gauge the time complexity of the algorithm. This is the most important consideration when deciding how the application will scale.

    An example is Ovid's question 1 in (OT) Interview questions -- your response?. Here, a 'solution' to a matching problem is presented in code, but is order n2.

    Especially where huge databases are involved, getting the algorithm right far outweighs shaving CPU cycles off a calculation. In order to achieve this, it becomes important to think about precisely how an application will scale. If we double the number of customers, how will this affect the run time?

Which way is "better"
by bluto (Curate) on Sep 09, 2002 at 17:00 UTC
    Recently I rediscovered the similar...

    Before you spend a lot of time and effort trying to use a module, make sure it matters.

    I've noticed I tend to over perlify/generify/engineer solutions since I've started to use Perl more frequently.

    During a recent project I ended up needing to repeatably check the uniqueness of about 7.8 million strings of about 15 characters each (stored in a flat file). My first solution, which I rejected after a quick test, was to just populate a hash and make sure I never added an element that already existed. Goodbye swap & CPU.

    The solution seemed to allow using Set::IntSpan, so I installed it and tried it out. Unfortunately, the sets were quite large and insertion time didn't scale (it took several hours to complete). Next I thought I just needed to use some kind of "database". So I tried out one or more of the "builtin" in ones (e.g. SDBM or something similar). Same problem.

    I started thinking about getting more serious and installing SQL or a working version of DB_File or one of the "Sort" modules for sorting largefiles. Chances are good one of these would work, but I was running out of time and needed to get this working.

    I realized that I needed to take a step backward and think about how I would do this without perl. I ended up just using the standard unix 'sort' utility and then having perl just run through the result and check to make sure no two adjacent lines matched. This seems fairly low-tech, but it works in less then 10 minutes with a resonable amount of memory rather than several hours and/or lots of memory (and lots of installation/debugging/etc).

    The project is now over so I don't need something portable. Even if I did, I would probably do the same thing and leave the "optimization" (i.e. a supposedly cleaner solution) until after I had tested this further.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://196176]
Approved by graff
Front-paged by FoxtrotUniform
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (10)
As of 2024-04-16 08:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found