in reply to Sheer Size

Optimize for clarity, not speed. As Donald Knuth once said, "premature optimization is the root of all evil." Rarely, if ever, can we adequately determine exactly what parts of the program are going to truly be the performance hogs. If you starting developing your code with optimization as the primary goal, you're going to shoot yourself in the foot. After all, you already know the benefits of writing clear, easily maintainable code. Why risk losing those benefits for the unknown benefit of a performance increase when you don't yet know that performance is an issue!?

Consider the following: you have three main processes, A, B, and C. After working on B, you realize that you can spend about a week working on it to increase B's performance by 90%. Wow! That's a huge saving. If, however, you discover that the actual production usage of B is only 5% before optimization, then you probably wasted a week. What if you can only improve A's performance by 25% with a week worth of rewriting? If A takes up 50% of the total run time of the system (before optimization), then you have more profitably spent your week. However, it's often difficult, if not impossible, to truly gauge production usage of our systems until we get real data and put our systems through their paces.

Further, with a system as large as you are describing, it's even more important to not optimize for speed while you're developing it. Since those optimizations tend to reduce clarity, you wind up having a huge, difficult to maintain system with optimizations that are probably not, well, optimal. Trust me, maintenance programmers will appreciate having a slow, but easy to maintain system that can then be fine-tuned.

Once you have your system near completion or actually in production, then you can start using Devel::Dprof and other tools to figure out where your performance issues are.

Cheers,
Ovid

Vote for paco!

Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

Replies are listed 'Best First'.
Re: (Ovid) Re: Sheer Size
by blackjudas (Pilgrim) on Sep 23, 2001 at 01:48 UTC
    While I certainly agree with all of the points you've made, what are your experiences in the field of "large projects" in perl? While I'm not complaining and labeling perl as a slow solution, I'd like to see what the ceiling _limit_ that perl developers have reached in matters of sheer size. Currently all of the modules are being interpreted when the program runs (a feature that will be disabled after development hopefully) I'm quite happy with the speed of the system at this point, but the aspect of bloat makes me wonder at the possiblity of reaching a point when the program becommes annoying to use or even completely unusable. Currently at one point, it calculates a great number of variables to come up with a total price per night sold.. each iteration per night takes between 0.5 and 1.5 seconds so when multiplying that time factor by the number of nights requested also multiplied by the number of relevant results this thing could take upwards of 50 minutes to return a result! (gross eh? and thats the best scenario by average) But I am confident that the loops will get optimized and thus further reducing the time to response. I guess what I'd like to know specifically is this.. what kind of factor does the number of lines of code play (number of instructions or better said # of ;'s etc) when using an interpreter such as perl, also the file open calls by perl to look in each module. Of course the end result is the most important in a production environment and the code will get fully profiled before release. What kind of issues concerning useablity (ie. speed) have you run into while working on a large project in perl?
      I'm not sure those questions give a meaningful metric, in your case.

      Not knowing the intricacies of your program at this point, all we can do is offer guesses based on instinct and experience. If there are 15 to 20 database calls per iteration, my gut tells me that's your bottleneck. (It's possible that there'll be a really poorly coded algorithm, but that's a lot of database work.)

      As for the lines of code in general, if you're not swapping, you're probably okay. The important thing is how much work you do in each iteration, not how big the program is overall.

      You'd have to go to a fair bit of work to make Perl re-open and re-compile each module each time you want to use it, so I doubt you're doing that. (It's doable, yes, but you really have to want it. There aren't many good reasons to do that, either.) When your program starts, anything used is compiled. Bang, you're in business. As it runs, anything it must require is compiled, once, and you're still in business. You pay your money and you get your hand stamped automagically.

      Of course, code size does matter in some cases. If you utterly destroy locality of reference with frequent, long branches, you'll take a performance hit. Then again, Perl's not really your language if you're worried about processor pipelining. It'd be a pretty Baroque program to do that, too.

      Does that make sense?

        Precisely, that's what I was looking for, instinct and experience. This program that I talk about is a mere example, just looking for any issues on performance when perl compiles large pieces of code. Yes, you're right one of the bottlenecks is the initial DB calls for each instruction (user, session, event control) plus any more that the code requires to return a result to the calling module.

        As more "lower level" stuff gets tacked on to each instruction yes, time to response will certainly slow down, though I thought that if perl has to compile 5-6 megs of plain text files for each instruction my immediate thought would be the fact that it would slow the program down where only 200k of that code is relevant.

        Certainly makes sense chromatic! Thanks for the reply.