in reply to Re^2: Optimisation (global versus singular)
in thread Optimisation isn't a dirty word.

It has to be seen that optimising library code, inner classes and similar lower layers, at the time of their creation--is not premature! It maybe before the maturity of the (first) application that uses them, but optimising a library or class prior to it inclusion in an application is an integral part of it's maturisation.

This is true only to the extent that the interface to the library is influenced by it's implementation, is it not? Isn't the point of libraries and library calls to abstract away all the underlying complexity, and to leave the details open to optimizations?

And yes, I'm playing devil's advocate for a moment, because I've seen the exact scenarios you describe, where "we'll speed it up if we have to" inevitably led to "we don't have the time/money to re-write the central data structures/premises underlying the application three weeks before our due date". I'm interested in seeing why those assumptions failed (and they did), because in theory, it all should have worked out...

  • Comment on Re^3: Optimisation (global versus singular)

Replies are listed 'Best First'.
Re^4: Optimisation (global versus singular)
by Tanktalus (Canon) on Oct 25, 2005 at 17:32 UTC

    I like that you've brought this up. You've raised the all-important practical side of software development. Life is about choices - selecting one thing over another. If the time to get the project live is constant, then you may need to choose between an implementation that works, and one that is fast. The warning against premature optimisation is well-deserved: it's far better to be slow and right than it is fast and wrong. I can always throw a faster CPU at the problem to speed it up, but I can't always throw hardware at a problem to correct the answer. That requires programmer time - which is far more expensive.

    The product I work on has a key component of speed. It must respond within a certain amount of time - and that amount is always lessening. And, just for fun, we keep the hardware constant (or we adjust our requirements based on the new hardware). But then, to meet that requirement, we actually pay a group of people to focus solely on finding bottlenecks and reducing, if not outright eliminating, them. That's just the management-recognised cost of performance. We pay 20 people to get it right, and 2 people to get it fast. Trading off money for performance in the same amount of time.

    If you don't have the budget to get everything done on time, working, and fast enough, I would have to drop "fast enough" every time. It's much easier to make a slow, working application fast than a fast, non-working application correct.

    There are some simple things one can do to keep from getting slow in the first place, e.g., in C++, I encourage my coworkers to do:

    const size_t len = strlen(some_string) + strlen(other_string) + 1; char* copy = new char[len]; memcpy(copy, '\0', len);
    rather than doing that strlen twice each. (Well, actually, I encourage them to put the new and memcpy into an inline function.) Is that premature? Perhaps - but I look at it as a way to reduce the chance of errors: rather than duplicating code, I'm refactoring it. In a way that allows the compiler to make its own optimisations. That it allows the compiler to optimise isn't a premature optimisation - it's just a smarter way to do something which happens to be a faster alternative.

    I suppose that means that even within "premature optimisation", there is a lot of fuzziness and leeway as to what constitutes "premature". Or even "optimisation".

      I suppose that means that even within "premature optimisation", there is a lot of fuzziness and leeway as to what constitutes "premature". Or even "optimisation".

      That's for sure! I thought the good thing about using C++ was that you had classes to simplify standard programming tasks like string handling!

      If the overhead of the (presumably well optimized) string classes for the language itself are "too slow", it leaves me wondering what's left in the language that's still "fast enough" to be usable? Wouldn't it be faster to resort to pure C. at that point?

      Pardon my ignorance on the subject; I learned C. ten years ago, but never got around to learning C++; my books kept getting out of date as the language was revised. :-( Isn't the STL part of the C++ language standard these days, or is that a pending idea? I'm all confused! (as usual...)

Re^4: Optimisation (global versus singular)
by BrowserUk (Patriarch) on Oct 26, 2005 at 00:55 UTC
    Isn't the point of libraries and library calls to abstract away all the underlying complexity, and to leave the details open to optimizations?

    Formally (as in formality, not previously), yes. But ... :)

    Unfortunately, often it is the abstraction itself that is the source of the problem. Whilst abstraction can be great boon in isolating the caller from the code called, that isolation can be a double-edged sword when it comes to performance.

    A trivial example, writing an RGB class, based around a blessed hash with RED, GREEN & BLUE elements, setters and getters, constructors for building them from lists of 3 values or a 24-bit integer, is a perfectly sensible approach. Especially if you wanted the basic manipulations being performed to also work on top of say, the windows GDI, which takes uses 24-bit values for colors. But using that class for manipulating the colors in images (eg.Image color swapping with GD), where the underlying API requires you to supply a list of 3 values, the performance hit of the conversions and method calls involved would be excrusiating.

    Once the code has been written based around that abstraction at the lower levels, removing it completely, as opposed to substituing another underlying data mechanism whilst retaining the abstraction, would not only means wholesale changes to the calling code, it would also have incurred considerable development and testing time that would simply have to be discarded.

    This is a good example of where optimisation at the lower levels would pay hansom dividends all across the board. For example, were it possible to optimise perl's methods and subroutines further (and contrasting their performance with the current crop of other, similarly dynamic languages, that doesn't seem likely), then that optimisation would benefit not only in the greater performance of procedural and OO code, but also in the removal of a barrier to greater use of abstraction.

    It really is a case, again, of balance. The need to balance the theoretically correct and desirable dictates of formal methodology with the practicalities and limitations of ones environment. The tempting solution of "throw hardware at the problem" that can be so effective for high profit, low volume applications (like webservers and DB servers) suddenly becomes completely impractical for applications that sit on the other side of the client-server divide where the hardware numbers in the hundreds or thousands of units; has a low profit or more often a net cost, over it's life cycle; and where it has to compete for resources with any number of other equally important and potentially, equally memory & cycle hungry applications.

    In a past life I did a good deal of usability testing on a "workstation application". The specification called for a user perceived response time of 1/10th of a second. In performance testing, this was easily achieved in the first Beta. But when that Beta was rolled out to a limited evaluation group of users, their immediate and strident verdict was "It's to slow!". Performance testing had been carried out on a standard specification workstation, memory, cpu, graphics card, disc performance, OS type and version were all identical, but still they complained.

    What went wrong was that the test workstations were running nothing else. The software ran quickly because it had all the cycles and all the memory to itself. In the real world, every user had browsers, email clients, WPs and spreadsheets and at least one and often several 3270 emulators running. The net effect was that the combined memory usage pushed the machines into swapping and performance fell dramatically. The projected cost of upgrading the 10,000 machines that would run the software was huge.

    The solution settled upon was to go into the application and remove several layers of abstraction at the lower levels. Essentially, lots of small items that had each been instantiated as individual objects were stored as a single, compact C array and accessed directly. The additional cost of breaking up the conglomorated items into their individual fields each time they were accessed was more than offset by the considerable reduction in memory use which avoided the need to go into swapping on most workstations.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.