Heffstar has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

I'm looking for some wisdom and insight into Apache 1.3 and mod_perl.

My boss and I got into a discussion about proper web server setup and how mod_perl is used and we both aren't very suave when it comes to web admin.

Here's the scenario:

Apache 1.3
MinSpareServers 10
MaxSpareServers 20
StartServers 20
MaxRequestsPerChild 10000
MaxClients 500

We're running into the problem where we've got 20 apache children processes running which keep getting larger and larger as perl outputs our dynamic pages, all of the code is stored in RAM which is allocated to each apache process.

We do a lot of serving of PDFs and the way it's done (I don't have the code in front of me, so forgive me) is something like:

 while (<FILE>) { print @_; }

Would the size of the file be stored in memory along with the compiled code?

Also, any input on the Apache directives would be appreciated.

Thanks and all the best!

Replies are listed 'Best First'.
Re: Perl and Apache 1.3
by CountZero (Bishop) on Oct 10, 2009 at 06:09 UTC
    If those PDF-files are "static", i.e. they are not being made "on the spot" for every request, it is advisable to let Apache serve them rather than having to go through a perl-script in doing so.

    Also, if possible, switch to Apache2 and mod_perl 2 which allows you much more flexibility in configuring your application.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

      In regard to letting Apache server the files, rather than output them, I'm pretty sure the reason we're doing it this way is because we're trying to keep it so certain files are secure and unavailable to other users.

      If there's an easy way to do this and still have security, please let me know. All of our users access our database and files as a generic web user, not as themselves, if that answers a future question...

        It's a long time ago I used Apache 1.3 (long since switched to Apache 2), but IIRC even Apache 1.3 was able to use a basic authentification and authorisation system which can distinguish between users and the files which they have access to.

        Actually, if you allow your users to run a cgi-script which serves the file, what is the difference with allowing them to access that file directly? Or does the script itself run an authentification/authorisation scheme?

        CountZero

        A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Re: Perl and Apache 1.3
by Herkum (Parson) on Oct 10, 2009 at 03:19 UTC

    Because there is no hard numbers as we can really do is give you some general rules to watch out for.

    In this case, the main thing to consider is the time you lose for a having a child which owns a large hunk of memory versus having to reload that child every once in a while. If the child slows down the system(example, because of a paging file), then you should reduce the MaxRequestsPerChild per child until that stops being an issue.

Re: Perl and Apache 1.3
by trwww (Priest) on Oct 11, 2009 at 00:25 UTC

    We're running into the problem where we've got 20 apache children processes running which keep getting larger and larger as perl outputs our dynamic pages...

    Then you've got a memory leak. This usually happens in perl when there is a circular reference to a variable and therefore it dosen't get properly refcounted. It can also happen when there is a global variable that keeps getting data added to it (imagine an array that gets data pushed on to it every request.

    As a quick fix, lower the MaxRequestsPerChild to a low number. Maybe 100... or even 10. Otherwise, you're going to have to find the memory leak.

    In general, if all of a program's variables are properly scoped then the memory footprint of the program will not continually grow. It will quickly grow to the amount of memory it needs to perform its task, and then level off.

    all of the code is stored in RAM which is allocated to each apache process

    If you load code and data in to mod_perl before apache forks its children, then the memory will be shared. How big are the files that make up the application? I can't imagine this part being too much of an issue.

    Would the size of the file be stored in memory along with the compiled code?

    How big are the .pdf files? If they are particularly large, you may want to figure out a way for the web app to hand off the .pdf generation to a different, short lived process that will run in its own memory space and return the memory to the OS after it finishes running. After it runs, do a http redirect to have apache directly serve the file.

    But you definitely have a memory leak if the child httpds are continually growing and growing in memory size.

      Well, I guess I should specifically say that these processes get larger only when they've got CPU activity associated. That would mean that more compiled code is being loaded into memory associated with the httpd process, right?

      Also, the associated PDFs are not huge (max 10MB) and the scripts that generate our application are usually around half a MB, but we have one that's about 10MB.

      Since we're talking a lot of Apache configuration, I'll pose a couple of related questions:

      MaxRequestsPerChild, when reached, causes the httpd process to terminate and if the load is high enough, another httpd process will start up, correct?

      Does anyone have any idea how long it would normally take to start up a new process? My boss seems to think that it's up to 20 seconds and that starting a new child is a VERY expensive operation.

      This server is a fairly decent machine though: 3GHz Xeon Dual Core, 4GB. It runs both Apache and MySQL for a few hundred users, but from what I've read, people who know configuration can get away with much, MUCH less...

        Basically wrapping this one up should anyone read this thread down the road...

        I've set the above settings to the following:
        MinSpareServers 5
        MaxSpareServers 10
        StartServers 8
        MaxRequestsPerChild 2500
        MaxClients 100

        The biggest change that I made was turning KeepAlive off. What I found was that with KeepAlive turned on, it would have to reach MaxKeepAliveRequests multiplied by MaxRequestsPerChild (2000 * 10000 = 20,000,000) prior to killing off the child process and releasing the memory associated. Just a tad high. Set at 2500, my process size reaches ~100MB then will die off.

        Also, by turning KeepAlive off, it allowed the server process to immediately serve another client rather than sit around waiting for "KeepAliveTimeout" seconds.

        Thanks for the help monks!