amw1 has asked for the wisdom of the Perl Monks concerning the following question:

The problem

We have a reasonably large collection of packages (over 100). We are starting to run into problems with load times and more specifically, the amount of code that needs to be compiled and loaded into memory. The issue stems from the cascade of use statements. Script A uses package B which in turn uses package C which ends up through a huge chain of dependencies 'use'ing 50% of our packages. We end up with a situation where the login page ends up including a module that is entirely unrelated to the login process because one of the modules needed for login needs the unrelated package for a few of its routines.

I am fully aware that this is a hole that we dug for ourselves and that the real answer is to structure our package dependencies in such a way that this doesn't happen. That said, at this point in time the cost to go through and restructure 100k+ lines of code is prohibitive. We are going to be making baby steps to resolve these issues as we have time, but we aren't going to be able to put everything on hold while we fix it.

The idea for a fix

An idea that we've been toying with is to use UNIVERSAL::AUTOLOAD to deal with loading the appropriate packages at runtime. Through some inspection of some of the problem packages it turns out that many of the included packages are included for a function that only gets used in a few specifc places. If those functions aren't called we don't need to load the package.

The important thing to note is we are running CGI's. They load, run, then exit. There is no obvious win by having everything loaded up front. We don't use export very often (for our in house written packages) and we don't use OO functionality in many places either. The other thing to note is that we tend to use fully qualified function names Foo::Bar::func() so we'd be able to reliably figure out what package needs to be used.

The question

We are wondering if there is anything inherantly broken with this approach. At the end of the day the real answer is to re-structure the code to avoid this problem. However, we're looking for something to fix some of our issues before we are able to entirely fix it.

  • Comment on Dealing with large chains of dependent packages

Replies are listed 'Best First'.
Re: Dealing with large chains of dependent packages
by jhourcle (Prior) on Aug 01, 2005 at 18:35 UTC

    If you have enough memory, and you're using Apache, you may want to look into using mod_perl.

    Although there may be some quirks to deal with, it'll keep one copy of each module in memory, that all of the CGIs can share, so there isn't the loading and teardown for each page call.

    Of course, this won't solve the general case of the issue that you're seeing (the answer to that is to not 'use' modules up front, but 'require' them as you need them, and then import the functions that you need, (if any, as you said you were using fully qualified functions).

    As you've said you're using fully qualified function calls, you may have a leg up on the second approach.

Re: Dealing with large chains of dependent packages (require)
by tye (Sage) on Aug 01, 2005 at 19:05 UTC

    One solution that I didn't see mentioned is to replace use with require(s) at the first point(s) where that module's feature(s) are actually needed. This is likely an inappropriate solution for many cases, but it can be a simple and very effective solution in some cases.

    - tye        

Re: Dealing with large chains of dependent packages
by 5mi11er (Deacon) on Aug 01, 2005 at 18:33 UTC
    Forgive me for being presumptuous, but I would venture to guess that the majority of the problem is created by a few widely used packages.

    If this is the case, I'd think that fixing a few widely used packages might reduce the problem a lot; thereby reducing the time constraints such that you'd be able to clean up most of the package collection over a more extended period of time.

    -Scott

    Update: For a more specific suggestion, I'd call your attention to this:

    "some of the problem packages it turns out that many of the included packages are included for a function that only gets used in a few specifc places
    Remove those "seldomly used" sections, and create yet another package for those.

      Excellent advice.
Re: Dealing with large chains of dependent packages
by derby (Abbot) on Aug 01, 2005 at 18:49 UTC

    Yep, the real answer is to re-factor the package that pulls in half the other packages - that just doesn't sound right!

    I think there are three approaches outside of re-factoring:

    -derby
      Some caveats about SelfLoader. We've been using it in production here at my company for awhile and it has a few gotchas:

      1) Debugging becomes more difficult as line numbers aren't reperesnted correctly, nor the actual error. Often the error will be presented in the highest-up module used in your tree. I recommend having a script that will comment out the selfloader code while you are in development and put them back in when you are about to put things in production.

      2) Many editors get confused by SelfLoader because your actual subroutines sit below the __DATA__ declaration. I don't know how to trick Emacs or VI into seeing these correctly, but JEdit maintains its highlighting correctly.

      The speed gains in a non mod-perl environment are nice though.

Re: Dealing with large chains of dependent packages
by Joost (Canon) on Aug 01, 2005 at 20:57 UTC
Re: Dealing with large chains of dependent packages
by blahblahblah (Priest) on Aug 02, 2005 at 04:11 UTC
    In the same situation a few years ago, we came up with the same idea and went with it. It has worked well for us.
    benefits:
    1. If I remember correctly, we saw noticeably lower memory usage and a just-barely quicker startup time.

    2. Merging changes, patching old releases, and working together with a large team is much easier. Instead of constantly merging changes to the same large files, now we're typically each working on tiny files that don't affect one another.

    3. Copying code from one script to another doesn't result in annoying errors when we forget to require the right lib files. We require one lib file with the AUTOLOAD and it takes care of the rest, including nice error reporting and logging when something goes wrong (sub not found, sub doesn't compile, etc.).

    I've really grown to like our current system, more because of #2 and #3 than #1.

    drawbacks:
    Once in a while you're going to run into wierd things caused by the autoload. I think we had a problem with sort subs in other packages, for example:

    package A; my @sorted = sort B::notYetLoaded @unsorted;
    We've found workarounds for all of the problems, but each one took time.

    Also, I recently tried to use inheritance (we only recently started doing much object-oriented programming). Working with our AUTOLOAD (while trying to preserve its behavior for all our existing code) was ugly enough that I gave up inheritance for now.

    I agree with others above who have suggested mod_perl and other cleaner solutions -- you'll get a much bigger boost from mod_perl than from this -- but if those 100K lines of code weren't written with mod_perl's persistence in mind, that could be daunting.

    -Joe

Re: Dealing with large chains of dependent packages
by Anonymous Monk on Aug 02, 2005 at 02:31 UTC
    There is a lot wrong with the approach. Your using modules and your complaining that they don't require the functions they use 'enough.' If they didn't require them at all, then the module shouldn't be including them. If they require them and they aren't practical then your problem lies with the module's author.

    The second problem is: "There is no obvious win by having everything loaded up front." Which would imply a lack of education. Substituting nice compile time errors for a greater risk of run time errors is never a good thing. That isn't the only advantage, compile-time can take its time, run time is supposed to be the fast. The alternative, is wait while running, rather than waiting to compile.

    Require is out of style for a reason, leave it that way. I wouldn't bother hacking modules apart to save compile time, I would take the suggestion of moving to modperl, or mason. Very easy to set up, and then you can compile once, and execute over and over again.

    Evan Carroll
    www.evancarroll.com