in reply to Faster Perl, Good and Bad News

Functions have a lot of CPU overhead, so store data where you can, and use direct access for that data.

This is entirely too simplistic a view.

Optimization is often about making a trade-off between space and time. Which way to go depends on the characteristics of your (working) application. To prejudge what you'll need optimize and how is a grand mistake. Unless you have a lot of experience. Then, it's merely a mistake.

If you're running up against the edge of your available memory, pushing stuff into memory risks causing virtual memory thrashing, which can be enormously expensive.

Here's an example from your earlier Possible Memory Usage Issues post: You're representing a Tree in memory. You have Node and Leaf objects. During a tree traversal, you need to distinguish between (internal) Nodes and Leafs. How do you do this?

One way to to store a type indicator in each object, and to either provide an common accessor for this field, or allow clients to peek into the object directly, so that the type can be queried. Let's assume you're doing pedal-to-the-metal optimization, and have made the dubious choice* to let clients peek into the objects directly. This approach trades away time for space (the extra slot in each object. Each time you create a Node or a Leaf, you're taking on the (slight) extra overhead of setting up an additional slot or association, and each object carries around the extra space.

Another approach is to dispense with a common type field in favor of a predicate function, which might be implemented as follows:

package Node; ... sub isLeaf { 0 } ... package Leaf; ... sub isLeaf { 1 }
This approach trades away space for time. Because of method lookup, it's a bit more expensive to do   if ( $o->isLeaf() ) { ... than it is to do   if ( $o->[$Node::isLeaf] ) { ... but which is better?

And the answer is: It Depends!

And this is only a simple example. There are a range of space-for-time tradeoffs possible in many applications, including caching (or not caching) frequently calculating intermediate results.

Unless you measure a working application on real data, you won't know for sure. To paraphrase Sherlock Holmes, "It is a capital crime to optimize before one has collected data."


*Allowing clients to peek directly into objects increases "coupling", which makes an application harder to maintain and modify. Much of the flexibily of objects is that they hide their internal representation from clients, allowing implementations to be easily replaced.

Replies are listed 'Best First'.
Re: Re: Faster Perl, Good and Bad News
by abitkin (Monk) on Aug 09, 2002 at 22:48 UTC
    dws

    I think you are saying something similar to what has been said, but in a more complete way. These are things to do for performance tuning; to get your script to run in a "sweet spot" of memory/cpu usage.

    The only reason I even put in the bit about starting with a c data structure is for those who re-write all their code to help them tune it to run the best that it can.

    You have to put work in to get these performance savings to happen, so I don't expect people to use it pass the point at which things run the best for them. Now I'm having to tune this for two machines, one with 128 megs of ram and 700 mhz pIII processor, and a pII 300 mhz with 212 megs of ram. With those limits, I stopped trying to optimize after I got under the requirements of both systems (running in a reasonable time on one, and not using the swap disk on the other.) Perhaps this would have been better titled as tips and tricks for perfomance.

    Personally, I think the real gem is the recursive calls in c, mainly becuase I didn't find any good examples of recursing a perl data structure in c, until I asked here.

    Now then, I'm going to sit in the corner and put my thumb in my mouth, to try to keep my foot out of it.
      Now I'm having to tune this for two machines, one with 128 megs of ram and 700 mhz pIII processor, and a pII 300 mhz with 212 megs of ram.

      Something else to consider when optimizing: What is the cost , compared with the cost of implementing various solutions. Multiply the time it takes you to develop an XS-based solution times the cost of your time, and compare this to the cost of adding memory to these machines. You can add a 128Mb stick to a box for < 75 USD right now.(Granted, once you factor in overhead in including purchasing and labor, it costs an organization more this.)

      It may well be that capital budget is difficult where you work, and that throwing your time at the problem is less expensive. But this is still a calculation that you and your management (or management equivalent) need to make.

          You can add a 128Mb stick to a box for < 75 USD right now.(Granted, once you factor in overhead in including purchasing and labor, it costs an organization more this.)

        dws++ for bringing up the issue of adding hardware (there are many ways of making code fast enough, one of which is throwing Moore's Law at the problem), but if this code's running on a production machine, the cost of adding hardware may be astronomically higher than $75USD once you factor in the cost of downtime. If you don't have a hot failover box, you'll have to shut down the machine for at least a couple of minutes, and in my experience this sort of trivial upgrade ("we're just adding another stick of RAM, what could go wrong?") is exactly the kind of thing that ends up lasting for hours and generating infinite Angry Customer Calls. If that $75 stick of RAM ends up losing you a $100,000 contract, you might have been better off paying a programmer to spend a day profiling, optimizing, and documenting the code.

        I realize that your response was much more sophisticated than just "hardware's cheap, throw hardware at it", but not all costs are obvious, and it's often a mistake to look only at first-order costs. (This happens with a lot of hardware purchases: buying that cheap KVM switch costs you a lot of money when it dies just before a major outage, for instance.)

        That said, if you're running into hardware walls like this on a regular basis, it's probably time to upgrade somewhere in the near future -- like your next maintenance window.

        (And without bringing in a straw-man argument, let me just mention that there are some places -- hospitals, for instance -- where downtime costs more than just money.)

        --
        F o x t r o t U n i f o r m
        Found a typo in this node? /msg me
        The hell with paco, vote for Erudil!