in reply to Re^8: Timing concerns on PerlEX/Mod_Perl
in thread Timing concerns on PerlEX/Mod_Perl

First off, there would be no point in running 3500 instances. It was just by way of example to show what a resource hog Apache is.

even if you do run 3,500 instances, they would each also spawn their own perl.exe, which in turn would consume more resources, no?

Yes & no. Yes, each would run it's own copy of Perl. No, that wouldn't consume vast amounts of resource. Under win32 (and probably under *nix, but that's not my domain), when you run a second copy of an executable, the executable and static data segments of the process are shared. Eg. Only one copy is loaded into memory. Only the stack and heap segments are unique. So starting a second copy of either tiny.exe or perl.exe costs very little. Just their stack and heap allocations, and they can be set very small and allowed to grow on demand.

In theory, when Apache/mod_perl forks, the preloaded chunks of Perl code are shared by COW--BUT IT AIN'T TRUE!. Everytime a forked copy executes code from the preloaded cache, and does any one of a number of simple things: like taking a reference (to anything!); or incrementing, or decrementing, or in some cases, even just printing the value of, a scalar, whole chunks of the COW-"shared memory" have to be allocated and copied. So, the mod_perl hack to avoid loading time just trades that for piecemeal, on-the-fly memory allocations and copying. And the more you preload, the worse it gets. Hence your problems I think.

Conversely, perl cgi scripts are individually quite small (when compared to their loaded footprint), and modern servers do a pretty amazing job of keeping frequently used files in cache. That same memory you are utilising for caching your mod_perl loaded code just in case it is needed, is far better devoted to allowing the system to cache scripts that are used!

Most web sites--not all I know, but most--have (maybe) two or three dozen oft-used cgis. Now imagine that you had one instance of tiny (or lighttpd or nginx) set up to service each of those cgis, and a reverse proxy to distribute the requests to them (plus a static page server or two, and an image server or two). Each one can handle hundreds if not thousand of concurrent requests. You get fault-tolorance, load distribution etc. And go one step further and have the cgi servers run the single cgi they serve using a fastcgi connection to a matching perl instance.

Apache, and 'centralisation' in general, serve only to complicate things. With all your eggs in the same basket, finding the bad egg (bugs) is a total PITA--as you are discovering. By keeping individual things separated, you have the opportunity to concentrate your efforts on tuning those scripts that need it. The ones that get hit hardest. If need be, you can substitute a second layer of load balancing for any node and distribute load where needed. And if one script dies catastrophically, only that script is affected. The rest of the site continues oblivious to the problem.

Monitoring for failures and generating notifications is trivial. And the process of post mortem far easier because only the logging from that particular cgi is in that server's logs.

Need to add a second (or more) physical server to the mix. T'is easy, just split the individual instances across the machines according to their time/resource usage.

People seem to have forgotten the *nix philosophy of having each process do one thing and do it well. Programs like Apache that contain everything including the kitchen sink (with 2 1/2 bowls, a spray head and hands free tap, and a waste digester!), load everything, anyone might ever need. But there are probably only a handful of sites that ever use more than half of it.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."
  • Comment on Re^9: Timing concerns on PerlEX/Mod_Perl

Replies are listed 'Best First'.
Re^10: Timing concerns on PerlEX/Mod_Perl
by Anonymous Monk on Jul 27, 2008 at 00:06 UTC
    I have tried using Apache as a reverse proxy and ligthttpd(with and without fast cgi) and didn't seem to work too well. But then again it is probably because I have such big libraries and would need to break them down.

    Would breaking the code down into smaller and simpler scripts, using a Tiny webserver and reverse proxy and then loading the output with SOAP/REST work? or will I ultimately loose out on the communication?
Re^10: Timing concerns on PerlEX/Mod_Perl
by Anonymous Monk on Jul 27, 2008 at 08:30 UTC
    So, the mod_perl hack to avoid loading time just trades that for piecemeal, on-the-fly memory allocations and copying. And the more you preload, the worse it gets. Hence your problems I think.

    I thought that might have been the issue as well but the strange thing is if I run 3 sites under the same shared pool in IIS/perlex and an apache instance with mod_perl.(all on the same 1 server)

    with 1 instance of IIS being public, while other 2 are lets say different sub domain but not public. Same for apache..the public instance experiences these issues while the none public ones and apache run perfectly. so unless it hit a cap limit to the size a namespace can be(is that possible?) or it places a lock on the certain portion of the namespace(ram) too many times limiting the connections...but these are all guess work...is there any way to test this?

    On another not I will experiment with Tiny a bit to see if it will come in useful in the future, thanks!
      so unless it hit a cap limit to the size a namespace can be(is that possible?)

      Like I said. Mod_perl and Apache aren't things I have much experience of. Just enough to know that I don't want any more.

      But, I'll venture an opinion based upon the discussion so far and my knowledge of Perl in general. And say that I think it highly unlikely that this has anything to do with namespace capacities, which are unlimited within the constraints of memory availability, and which you say you are not running out of.

      Without seeing what you're doing--and from the sound of things, this is far too much to dump here in a post--it is really hard to suggest what would be the cause of a sudden and prolonged slowdown like this. But, once again, I'll hazzard a guess.

      I'm betting it is to do with access to the DB. Specifically, I'm betting that you're running out of db handles and are having to wait until the DBserver times out some existing connections before it will allow you to make a new connection. I guessing that you are instantiating new connections somewhere in your libraries, but never closing them. That's nothing more than a lot of supposition (I had to check back through the thread to see if you'd actually mention using a DB) and a vague memory of something similar.

      The quickest way to determine if I'm right would be to go check you DB logs for a time period when one of these slowdowns has occured and see what if any relevant errors you can find.

      If that is the cause, there might be a relatively easy fix. Good luck.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        I am using a database(mysql), but not persistent connections. So a database handle is opened and closed every time. Though the Time::HiRes start is done before the database handle is opened and checked after it is closed, so I would think if that was the issue, the time slowdown would be noticed in my benchmark. The database server is also on its own dedicated server so it can handle quite a lot of connections.