in reply to Re^7: Timing concerns on PerlEX/Mod_Perl
in thread Timing concerns on PerlEX/Mod_Perl

Well centralized system is good..when it works... example: perl..if the machine code makes perl possible its good..but when there is a fault with the machine code itself..it makes things more difficult...Due to the way the site is, a central system just seems to make more sense

Plus 3500 is kind of cheating, 2 reasons being is out of the 1.7gb, I would say 900mb or so is the actual number. The rest are things like windows services and etc...2nd is even if you do run 3,500 instances, they would each also spawn their own perl.exe, which in turn would consume more resources, no?

ModPerl/PerlEX isn't really that bad..because it saves on starting up. perl's dll itself is around 900kb. Loading it up 1000 times would mean 900,000kb. So even if I were not to use it's pre-loading abilities, I save up a lot on startup. The downside though is their lack of documentation and abilities to debug. Every time I do an update to the code forces me to reset the entire server and start it up again. This makes debugging it live environments living hell.

I am probably gonna have to for now just buy more servers and load balance it till I can find a solution to the issue. I was hopping though if anyone was familiar with the way modperl/perlex works and experienced similar issues. Since well, the execution of the code is only 0.2 seconds..but it takes 20 seconds to start up..which doesn't make sense...other then when making threads it having issues accessing the same namespace. Also one thing perlex doesn't use main as its primary namespace but instead uses PerlEX::Instance ID blah blah blah..so I am thinking maybe I am forcing it into using thus causing slowdowns..but these are all *hunches* and I can't say for sure, so while I try things, I am hoping maybe someone familiar with what could be the issue.
  • Comment on Re^8: Timing concerns on PerlEX/Mod_Perl

Replies are listed 'Best First'.
Re^9: Timing concerns on PerlEX/Mod_Perl
by BrowserUk (Patriarch) on Jul 26, 2008 at 23:24 UTC

    First off, there would be no point in running 3500 instances. It was just by way of example to show what a resource hog Apache is.

    even if you do run 3,500 instances, they would each also spawn their own perl.exe, which in turn would consume more resources, no?

    Yes & no. Yes, each would run it's own copy of Perl. No, that wouldn't consume vast amounts of resource. Under win32 (and probably under *nix, but that's not my domain), when you run a second copy of an executable, the executable and static data segments of the process are shared. Eg. Only one copy is loaded into memory. Only the stack and heap segments are unique. So starting a second copy of either tiny.exe or perl.exe costs very little. Just their stack and heap allocations, and they can be set very small and allowed to grow on demand.

    In theory, when Apache/mod_perl forks, the preloaded chunks of Perl code are shared by COW--BUT IT AIN'T TRUE!. Everytime a forked copy executes code from the preloaded cache, and does any one of a number of simple things: like taking a reference (to anything!); or incrementing, or decrementing, or in some cases, even just printing the value of, a scalar, whole chunks of the COW-"shared memory" have to be allocated and copied. So, the mod_perl hack to avoid loading time just trades that for piecemeal, on-the-fly memory allocations and copying. And the more you preload, the worse it gets. Hence your problems I think.

    Conversely, perl cgi scripts are individually quite small (when compared to their loaded footprint), and modern servers do a pretty amazing job of keeping frequently used files in cache. That same memory you are utilising for caching your mod_perl loaded code just in case it is needed, is far better devoted to allowing the system to cache scripts that are used!

    Most web sites--not all I know, but most--have (maybe) two or three dozen oft-used cgis. Now imagine that you had one instance of tiny (or lighttpd or nginx) set up to service each of those cgis, and a reverse proxy to distribute the requests to them (plus a static page server or two, and an image server or two). Each one can handle hundreds if not thousand of concurrent requests. You get fault-tolorance, load distribution etc. And go one step further and have the cgi servers run the single cgi they serve using a fastcgi connection to a matching perl instance.

    Apache, and 'centralisation' in general, serve only to complicate things. With all your eggs in the same basket, finding the bad egg (bugs) is a total PITA--as you are discovering. By keeping individual things separated, you have the opportunity to concentrate your efforts on tuning those scripts that need it. The ones that get hit hardest. If need be, you can substitute a second layer of load balancing for any node and distribute load where needed. And if one script dies catastrophically, only that script is affected. The rest of the site continues oblivious to the problem.

    Monitoring for failures and generating notifications is trivial. And the process of post mortem far easier because only the logging from that particular cgi is in that server's logs.

    Need to add a second (or more) physical server to the mix. T'is easy, just split the individual instances across the machines according to their time/resource usage.

    People seem to have forgotten the *nix philosophy of having each process do one thing and do it well. Programs like Apache that contain everything including the kitchen sink (with 2 1/2 bowls, a spray head and hands free tap, and a waste digester!), load everything, anyone might ever need. But there are probably only a handful of sites that ever use more than half of it.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      I have tried using Apache as a reverse proxy and ligthttpd(with and without fast cgi) and didn't seem to work too well. But then again it is probably because I have such big libraries and would need to break them down.

      Would breaking the code down into smaller and simpler scripts, using a Tiny webserver and reverse proxy and then loading the output with SOAP/REST work? or will I ultimately loose out on the communication?
      So, the mod_perl hack to avoid loading time just trades that for piecemeal, on-the-fly memory allocations and copying. And the more you preload, the worse it gets. Hence your problems I think.

      I thought that might have been the issue as well but the strange thing is if I run 3 sites under the same shared pool in IIS/perlex and an apache instance with mod_perl.(all on the same 1 server)

      with 1 instance of IIS being public, while other 2 are lets say different sub domain but not public. Same for apache..the public instance experiences these issues while the none public ones and apache run perfectly. so unless it hit a cap limit to the size a namespace can be(is that possible?) or it places a lock on the certain portion of the namespace(ram) too many times limiting the connections...but these are all guess work...is there any way to test this?

      On another not I will experiment with Tiny a bit to see if it will come in useful in the future, thanks!
        so unless it hit a cap limit to the size a namespace can be(is that possible?)

        Like I said. Mod_perl and Apache aren't things I have much experience of. Just enough to know that I don't want any more.

        But, I'll venture an opinion based upon the discussion so far and my knowledge of Perl in general. And say that I think it highly unlikely that this has anything to do with namespace capacities, which are unlimited within the constraints of memory availability, and which you say you are not running out of.

        Without seeing what you're doing--and from the sound of things, this is far too much to dump here in a post--it is really hard to suggest what would be the cause of a sudden and prolonged slowdown like this. But, once again, I'll hazzard a guess.

        I'm betting it is to do with access to the DB. Specifically, I'm betting that you're running out of db handles and are having to wait until the DBserver times out some existing connections before it will allow you to make a new connection. I guessing that you are instantiating new connections somewhere in your libraries, but never closing them. That's nothing more than a lot of supposition (I had to check back through the thread to see if you'd actually mention using a DB) and a vague memory of something similar.

        The quickest way to determine if I'm right would be to go check you DB logs for a time period when one of these slowdowns has occured and see what if any relevant errors you can find.

        If that is the cause, there might be a relatively easy fix. Good luck.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.