in reply to Re: Optimisation (global versus singular)
in thread Optimisation isn't a dirty word.

I won't say anything about premature optimization because you've been lectured enough about it :)

Since you brought it up :), and as proxy to everyone else that did so, I'll respond to the lecturing here.

There is a strange, but verifiable fact about the process of optimisation: The earlier* you do it, the easier, more effective and less troublesome it becomes.

*Here the word "earlier" is deliberately chosen to contrast with the word "premature". You can look up the various definitions for premature, but let's derive our own:

  • 'pre' - preceding, before, early.
  • 'mature' - considered and perfected, ripe, complete, full-grown.

    Now here's the thing. You develop your latest greatest XYZ application. You follow good, sound, SE/CS design and development principles. You eshew optimisations in favour of simple, clear maintainability. You test, document and verify. Your application is mature--it just runs too slowly.

    So now it is time to benchmark and optimise. But to do so, except for the most simple of applications, you have to dissect and re-write large parts of your application, re-write large parts of your test suite, and re-test and re-verify the whole darn thing.

    Retro-fitting optimisations, whether algorithmic or implementation, to an existing complex application is always considerably harder than writing them efficiently as you go.

    The problem with the interpretation of the "premature optimisation" axiom is that all too often people fail to miss that coding is hierarchal.

    With the exception of the simplest of applications, we code in layers. And the lower down in the hierarchy that a layer comes, the more effect it's efficiency (or otherwise) will have upon the overall efficiency of our application. And, attempting to improve the efficiency of those lower layers, once we already have upper layer dependencies upon them, the harder it is to do without breaking those upper layers.

    It has to be seen that optimising library code, inner classes and similar lower layers, at the time of their creation--is not premature! It maybe before the maturity of the (first) application that uses them, but optimising a library or class prior to it inclusion in an application is an integral part of it's maturisation.

    If we are serious about code reuse, we must realise that when we code classes and modules that we intend to be reusable, that it is incumbent upon us to consider making them as efficient as we can (subject to the bounds of correctness, usability and reasonable maintenance), regardless of whether that efficiency is required for the first application for which they are designed.

    Only in this way can we ensure that those classes, modules and libraries will be efficient enough for the next application that would like to use them. And the one after that.

    Leaving (reasonable, achievable) optimisations as an after-thought, as a last minute "we'll do it if we need to" step in the overall development plan, can lead to huge costs and overruns.

    Imagine trying to retroactively gain the kind of benefits we all derive from the internal optimisation that Perl has, on a case by case basis at the application level?

    Alternatively, if Perl was able to save the byte code representation of it's code, and relocatably reload these, (very effective, mature and clever) hacks like mod_perl would not be necessary+.

    +For their performance benefits. In deference to merlyn's advice about mod-perls's other features.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
    • Comment on Re^2: Optimisation (global versus singular)
  • Replies are listed 'Best First'.
    Re^3: Optimisation (global versus singular)
    by Anonymous Monk on Oct 25, 2005 at 17:09 UTC
      It has to be seen that optimising library code, inner classes and similar lower layers, at the time of their creation--is not premature! It maybe before the maturity of the (first) application that uses them, but optimising a library or class prior to it inclusion in an application is an integral part of it's maturisation.

      This is true only to the extent that the interface to the library is influenced by it's implementation, is it not? Isn't the point of libraries and library calls to abstract away all the underlying complexity, and to leave the details open to optimizations?

      And yes, I'm playing devil's advocate for a moment, because I've seen the exact scenarios you describe, where "we'll speed it up if we have to" inevitably led to "we don't have the time/money to re-write the central data structures/premises underlying the application three weeks before our due date". I'm interested in seeing why those assumptions failed (and they did), because in theory, it all should have worked out...

        I like that you've brought this up. You've raised the all-important practical side of software development. Life is about choices - selecting one thing over another. If the time to get the project live is constant, then you may need to choose between an implementation that works, and one that is fast. The warning against premature optimisation is well-deserved: it's far better to be slow and right than it is fast and wrong. I can always throw a faster CPU at the problem to speed it up, but I can't always throw hardware at a problem to correct the answer. That requires programmer time - which is far more expensive.

        The product I work on has a key component of speed. It must respond within a certain amount of time - and that amount is always lessening. And, just for fun, we keep the hardware constant (or we adjust our requirements based on the new hardware). But then, to meet that requirement, we actually pay a group of people to focus solely on finding bottlenecks and reducing, if not outright eliminating, them. That's just the management-recognised cost of performance. We pay 20 people to get it right, and 2 people to get it fast. Trading off money for performance in the same amount of time.

        If you don't have the budget to get everything done on time, working, and fast enough, I would have to drop "fast enough" every time. It's much easier to make a slow, working application fast than a fast, non-working application correct.

        There are some simple things one can do to keep from getting slow in the first place, e.g., in C++, I encourage my coworkers to do:

        const size_t len = strlen(some_string) + strlen(other_string) + 1; char* copy = new char[len]; memcpy(copy, '\0', len);
        rather than doing that strlen twice each. (Well, actually, I encourage them to put the new and memcpy into an inline function.) Is that premature? Perhaps - but I look at it as a way to reduce the chance of errors: rather than duplicating code, I'm refactoring it. In a way that allows the compiler to make its own optimisations. That it allows the compiler to optimise isn't a premature optimisation - it's just a smarter way to do something which happens to be a faster alternative.

        I suppose that means that even within "premature optimisation", there is a lot of fuzziness and leeway as to what constitutes "premature". Or even "optimisation".

          I suppose that means that even within "premature optimisation", there is a lot of fuzziness and leeway as to what constitutes "premature". Or even "optimisation".

          That's for sure! I thought the good thing about using C++ was that you had classes to simplify standard programming tasks like string handling!

          If the overhead of the (presumably well optimized) string classes for the language itself are "too slow", it leaves me wondering what's left in the language that's still "fast enough" to be usable? Wouldn't it be faster to resort to pure C. at that point?

          Pardon my ignorance on the subject; I learned C. ten years ago, but never got around to learning C++; my books kept getting out of date as the language was revised. :-( Isn't the STL part of the C++ language standard these days, or is that a pending idea? I'm all confused! (as usual...)

        Isn't the point of libraries and library calls to abstract away all the underlying complexity, and to leave the details open to optimizations?

        Formally (as in formality, not previously), yes. But ... :)

        Unfortunately, often it is the abstraction itself that is the source of the problem. Whilst abstraction can be great boon in isolating the caller from the code called, that isolation can be a double-edged sword when it comes to performance.

        A trivial example, writing an RGB class, based around a blessed hash with RED, GREEN & BLUE elements, setters and getters, constructors for building them from lists of 3 values or a 24-bit integer, is a perfectly sensible approach. Especially if you wanted the basic manipulations being performed to also work on top of say, the windows GDI, which takes uses 24-bit values for colors. But using that class for manipulating the colors in images (eg.Image color swapping with GD), where the underlying API requires you to supply a list of 3 values, the performance hit of the conversions and method calls involved would be excrusiating.

        Once the code has been written based around that abstraction at the lower levels, removing it completely, as opposed to substituing another underlying data mechanism whilst retaining the abstraction, would not only means wholesale changes to the calling code, it would also have incurred considerable development and testing time that would simply have to be discarded.

        This is a good example of where optimisation at the lower levels would pay hansom dividends all across the board. For example, were it possible to optimise perl's methods and subroutines further (and contrasting their performance with the current crop of other, similarly dynamic languages, that doesn't seem likely), then that optimisation would benefit not only in the greater performance of procedural and OO code, but also in the removal of a barrier to greater use of abstraction.

        It really is a case, again, of balance. The need to balance the theoretically correct and desirable dictates of formal methodology with the practicalities and limitations of ones environment. The tempting solution of "throw hardware at the problem" that can be so effective for high profit, low volume applications (like webservers and DB servers) suddenly becomes completely impractical for applications that sit on the other side of the client-server divide where the hardware numbers in the hundreds or thousands of units; has a low profit or more often a net cost, over it's life cycle; and where it has to compete for resources with any number of other equally important and potentially, equally memory & cycle hungry applications.

        In a past life I did a good deal of usability testing on a "workstation application". The specification called for a user perceived response time of 1/10th of a second. In performance testing, this was easily achieved in the first Beta. But when that Beta was rolled out to a limited evaluation group of users, their immediate and strident verdict was "It's to slow!". Performance testing had been carried out on a standard specification workstation, memory, cpu, graphics card, disc performance, OS type and version were all identical, but still they complained.

        What went wrong was that the test workstations were running nothing else. The software ran quickly because it had all the cycles and all the memory to itself. In the real world, every user had browsers, email clients, WPs and spreadsheets and at least one and often several 3270 emulators running. The net effect was that the combined memory usage pushed the machines into swapping and performance fell dramatically. The projected cost of upgrading the 10,000 machines that would run the software was huge.

        The solution settled upon was to go into the application and remove several layers of abstraction at the lower levels. Essentially, lots of small items that had each been instantiated as individual objects were stored as a single, compact C array and accessed directly. The additional cost of breaking up the conglomorated items into their individual fields each time they were accessed was more than offset by the considerable reduction in memory use which avoided the need to go into swapping on most workstations.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
    Re^3: Optimisation (global versus singular)
    by adrianh (Chancellor) on Oct 30, 2005 at 13:28 UTC

      Now here's the thing. You develop your latest greatest XYZ application. You follow good, sound, SE/CS design and development principles. You eshew optimisations in favour of simple, clear maintainability. You test, document and verify. Your application is mature--it just runs too slowly.

      So now it is time to benchmark and optimise. But to do so, except for the most simple of applications, you have to dissect and re-write large parts of your application, re-write large parts of your test suite, and re-test and re-verify the whole darn thing.

      The problem in this instance is not lack of optimisation. That's just a symptom. If I have to gut my application and start again because it turns out to be too slow then my development process is completely broken.

      Performance requirements are part of the spec. I should be developing in small end-to-end increments so that I can continually look at whether my application is meeting performance requirements and optimise appropriately.

        I should be developing in small end-to-end increments so that I can continually look at whether my application is meeting performance requirements and optimise appropriately.

        To a great extent, we are in agreement. Contrast this approach with the oft-quoted "Don't optimise until/unless you have to".

        Another aspect of it is that it is not always possible to codify performance requirements in such a way that they can easily be verified during the inner/lower levels of the development process.

        For example, when developing libraries and modules, the developers will have no knowledge of the performance requirements of (future) applications that will call their modules. It is all to easy for no consideration to be given to optimisation at this stage because the application for which the library is written has very loose or no performance requirements.

        The problem then arises when the next application makes heavy, sustained use of the module and it's tardy performance comes to light, but only once the calling application has been structured around the api of the inner module. At the stage, you have already had to commit enough resources to the calling code in designing and writing enough to actually do some level of performance testing, that replacing the library with an faster alternative is painful.

        It is bad enough if you have to re-write and/or optimise the library code, whilst retaining a compatible interface--at least you will mostly be able to stick with the design and code already developed, that allowed you to discover the problem.

        But the real problem arises when the performance bottleneck is the API itself. As an example, using hash-based objects to create trees & graphs seems natural and obvious in Perl, but when you try to use those objects to build and manipulate broad, deep trees or big graphs, they're very heavy on memory consumption and greatly lacking in performance. Given that many of the classical, useful algorithms for which trees and graphs are perfectly suited are often in the NP complete class, the poor performance distinctly limits their (re-)usability.

        In absentia of actual performance requirements, the best you can do when developing a library intended for large scale re-use, is to optimise it to the best of your ability commensurate with goals of correctness and maintainability:

      • Use the fastest algorithm you can find.
      • Use the fastest coding techniques available.

        Just occasionally, there are sufficient performance gains to be achieved by trading clarity and simplicity for an obscure idiom or complex syntax, that the trade is worth it.

        To be able to do this, having an awareness of what algorithms and techniques yeild the best performance, in your chosen language, is extremely useful. It doesn't mean that you optimise every last line of code you write, or do not often deliberately chose a less than optimal solution, for the purpose of favouring a more important goal for a given piece of code.

        It just means that it is worth taking a quick glance at some of the often seemingly pointless benchmarks that get posted here, and even running one or two of your own if you find yourself unable to choose between two ways of doing something for any other good reason. And re-checking your assumptions now and again.

        It's also worth knowing the costs of over modularisation and over engineering.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.