blackjudas has asked for the wisdom of the Perl Monks concerning the following question:

I'm looking for input from my fellow monks who have undertaken large project using perl. At this point, I'm working on a project which is a plug-in program of sorts (the main program only contains ~100 lines of code) the basis of the program is in the Modules it uses. At the moment, I'm looking at a minimum of 15 - 20 database calls per instruction and roughly 40 - 50k lines of code. At the end I expect it to reach somewhere between 300 - 500k lines of code (just a guess based on percentages), now... Based on your past experiences could those enlightened ones fill me in as to what I should expect as far as performance hits using perl? What is the largest project you've undertaken using perl and what sorts of methods have you used to increase performance?

Replies are listed 'Best First'.
(Ovid) Re: Sheer Size
by Ovid (Cardinal) on Sep 23, 2001 at 00:06 UTC

    Optimize for clarity, not speed. As Donald Knuth once said, "premature optimization is the root of all evil." Rarely, if ever, can we adequately determine exactly what parts of the program are going to truly be the performance hogs. If you starting developing your code with optimization as the primary goal, you're going to shoot yourself in the foot. After all, you already know the benefits of writing clear, easily maintainable code. Why risk losing those benefits for the unknown benefit of a performance increase when you don't yet know that performance is an issue!?

    Consider the following: you have three main processes, A, B, and C. After working on B, you realize that you can spend about a week working on it to increase B's performance by 90%. Wow! That's a huge saving. If, however, you discover that the actual production usage of B is only 5% before optimization, then you probably wasted a week. What if you can only improve A's performance by 25% with a week worth of rewriting? If A takes up 50% of the total run time of the system (before optimization), then you have more profitably spent your week. However, it's often difficult, if not impossible, to truly gauge production usage of our systems until we get real data and put our systems through their paces.

    Further, with a system as large as you are describing, it's even more important to not optimize for speed while you're developing it. Since those optimizations tend to reduce clarity, you wind up having a huge, difficult to maintain system with optimizations that are probably not, well, optimal. Trust me, maintenance programmers will appreciate having a slow, but easy to maintain system that can then be fine-tuned.

    Once you have your system near completion or actually in production, then you can start using Devel::Dprof and other tools to figure out where your performance issues are.

    Cheers,
    Ovid

    Vote for paco!

    Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

      While I certainly agree with all of the points you've made, what are your experiences in the field of "large projects" in perl? While I'm not complaining and labeling perl as a slow solution, I'd like to see what the ceiling _limit_ that perl developers have reached in matters of sheer size. Currently all of the modules are being interpreted when the program runs (a feature that will be disabled after development hopefully) I'm quite happy with the speed of the system at this point, but the aspect of bloat makes me wonder at the possiblity of reaching a point when the program becommes annoying to use or even completely unusable. Currently at one point, it calculates a great number of variables to come up with a total price per night sold.. each iteration per night takes between 0.5 and 1.5 seconds so when multiplying that time factor by the number of nights requested also multiplied by the number of relevant results this thing could take upwards of 50 minutes to return a result! (gross eh? and thats the best scenario by average) But I am confident that the loops will get optimized and thus further reducing the time to response. I guess what I'd like to know specifically is this.. what kind of factor does the number of lines of code play (number of instructions or better said # of ;'s etc) when using an interpreter such as perl, also the file open calls by perl to look in each module. Of course the end result is the most important in a production environment and the code will get fully profiled before release. What kind of issues concerning useablity (ie. speed) have you run into while working on a large project in perl?
        I'm not sure those questions give a meaningful metric, in your case.

        Not knowing the intricacies of your program at this point, all we can do is offer guesses based on instinct and experience. If there are 15 to 20 database calls per iteration, my gut tells me that's your bottleneck. (It's possible that there'll be a really poorly coded algorithm, but that's a lot of database work.)

        As for the lines of code in general, if you're not swapping, you're probably okay. The important thing is how much work you do in each iteration, not how big the program is overall.

        You'd have to go to a fair bit of work to make Perl re-open and re-compile each module each time you want to use it, so I doubt you're doing that. (It's doable, yes, but you really have to want it. There aren't many good reasons to do that, either.) When your program starts, anything used is compiled. Bang, you're in business. As it runs, anything it must require is compiled, once, and you're still in business. You pay your money and you get your hand stamped automagically.

        Of course, code size does matter in some cases. If you utterly destroy locality of reference with frequent, long branches, you'll take a performance hit. Then again, Perl's not really your language if you're worried about processor pipelining. It'd be a pretty Baroque program to do that, too.

        Does that make sense?

Re: Sheer Size
by suaveant (Parson) on Sep 22, 2001 at 23:47 UTC
    Well... just a suggestion, but try to keep your code as modular as you can, so that it'll be reusable later, and much easier to deal with. Also, make sure you are reusing code whevever you can... other than that... try to think ahead and design your code to be generic at the low levels... I guess my real thrust here is code reuse. Nothing is worse that having code in 20 different places that does nearly the same thing, and then realizing you have to change it all. :)

                    - Ant
                    - Some of my best work - Fish Dinner

      That goes without saying on any project large or small, all code used is created to be quite modular though I'm sure we've missed a few areas where code is somewhat duplicated, the end result won't have any of those issues (I hope :), though at this point all data access, interface and event handling are done by specific modules the rest is quite specific stuff that belong in their own modules, ie. accounting, and reporting etc.
Re: Sheer Size
by perrin (Chancellor) on Sep 23, 2001 at 01:55 UTC
    What performance should you expect from Perl? Fantastic amazing astonishing performance! Perl is very fast. If your program is slow, it will almost certainly be due to your database design or program architecture.

    I have built some large systems in Perl and Java, and without knowing the specifics of your system I can only give you a few pieces of general advice:

    • Use persistent database connections. Opening database connections is slow.
    • Communication between processes and machines is often a bottleneck.
    • Plan for caching, but don't actually build any until you see which parts run slowly.
    And, as Ovid pointed out, don't waste time optimizing your code for performance until you have a working system.
Re: Sheer Size
by dragonchild (Archbishop) on Sep 24, 2001 at 17:29 UTC
    At my previous job, I worked on a Perl project that ran to about 20k lines that we wrote and about 200k lines that we generated from templates, but still needed to edit by hand. This was an interactive program that acted as a network element in testing cellphone base stations.

    We had ZERO performance problems from the program. We didn't even try to optimize once. We simply wrote good Perl code and based our performance on how fast we could make changes to the code. (Since we were in a testing group, we had to keep up with the changes in the programs we were testing, as new features were turned on.) We got our turnaround time to under an hour, in most cases.

    We also (Thank the Gods!) had no DBI work to speak of, so I can't speak much on that.

    Just as a thought, if this program will always be connecting to the same database(s), I would look at something like mod_perl for databases. That way, you could open the connection and keep it open. Then, each Perl program that starts can connect to this process and ask it to do stuff. That should reduce your overhead in DB connections and the like.

    ------
    We are the carpenters and bricklayers of the Information Age.

    Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.