in reply to modularization, memory usage and performance

Not being an expert, I'd like to kick off this segment of our CGI-mangling program with one observation. Some people may disagree with my use of require. That's fine - we can just agree to disagree about that.

However, I would like you to be aware of both sides of the coin as it's not clear to me that you should be using require in these circumstances. Specifically, it sounds like you're switching to something similar to mod_perl. If that is the case, require is actually the wrong thing to use. It is actually better to use "use My::Module qw()" than "require My::Module". This will cause My::Module to be loaded and compiled with the starting of the webserver, rather than in each child process that the server kicks off via fork().

However, if you want your scripts to work both with mod_perl and regular CGI, you may have some trade-offs to make. Since not every CGI call may need all of your modules, using require makes sense - you only load/compile them if you need them. At hundreds or thousands of requests per minute (or second!), this can be significant.

Make your choice wisely - each choice affects the other's performance negatively. A possible third choice is to filter your code during the build, and have your Makefile.PL or Build.PL take a parameter that says whether to build for mod_perl or mod_cgi. With mod_perl, you would just filter all of your code as it is being copied to blib with something like s/require/use/ - of course, you'll want a bit more smarts than that (since that will kill something in quotes - comments don't matter much). Maybe you have a keyword __REQUIRE which gets filtered to require for mod_cgi, and use for mod_perl.

Replies are listed 'Best First'.
Re^2: modularization, memory usage and performance
by jmagiera (Novice) on Feb 09, 2005 at 01:14 UTC
    Thanks for your reply, Tanktalus. I think I need to clarify a bit more on the background of this script:

    It's designed as a "standalone-application", not a script running on a public webserver with 100s of requests. Therefore, the server also doesn't fork().

    I'd really appreciate input on the memory leak. If I just reload the same request from the browser, the memory usage keep growing, slowly but steadily with each request.

    Oh, I just had an enlightenment! The monks' powers are at work! The script creates a new CGI object every time it's run. Could this create memory leakage? When does perl collect garbage (if it does at all)?

      Why create a CGI object at all? Doesn't HTTP::Server::Simple give that to you when it calls handle_request?

      But to answer your question: garbage collection happens, as I understand it, when the last reference to an object disappears. As long as something refers to an object, it sticks around. This can be annoying, for example, when you've got a closure, so be careful. ;-)

      Also note that as you use the CGI module, it loads more of itself into memory and compiles more, and then it stays in case it is called a second time. So if each request causes you to call more parts of CGI.pm, that isn't a leak, it's just that you used more code, so more code had to be loaded and compiled. Avoiding using CGI any more than you have to (e.g., using templates for HTML rather than CGI's html code) may be a good idea here for many reasons, including this one.

        I have found that the data hash that is tie()d with <cpan//Tie::Persistent> is using up so much memory when it contains a lot of data. The funny thing is, even if I tie the data to a local hash, the memory is still allocated and not freed again. E. g.:
        sub loadData { my $file = shift; my %localhash = (); tie %localhash , 'Tie::Persistent', $file , 'rw'; untie %localhash; }

        Maybe I don't understand how perl counts references, but in my view that %localhash is a goner.