CgiProxy And Heavy Visitors

artin has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: CgiProxy And Heavy Visitors by tilly (Archbishop) on Apr 10, 2005 at 07:58 UTC
You should read the mod_perl strategy guide and pay very careful attention to the section titled Adding a Proxy Server in http Accelerator Mode. If you follow the recommendations there, your problems should be much improved. The problem is that mod_perl processes which take up a lot of RAM are having to spoonfeed data over a very thin pipe to client browsers. When you use a proxy server in that configuration, the mod_perl processes spew data to lightweight proxies and move on, and then the slow browsers tie up the lightweight proxies. Therefore you can use fewer mod_perl processes, saving RAM. When you get beyond what that will scale to on your hardware, the next obvious move is to add more memory. These days 1 GB of RAM is not that much, have 2-4 GB. Most acceleration techniques, such as caching, use RAM. After that runs out of steam, you'll need a load balancer in front of multiple webservers. (You probably want that well before you get to that load, just to have redundancy and failover.) As for shared memory, perrin did a number of benchmarks a while ago and found that shared memory modules are not that fast. He found that BerkeleyDB and MySQL were the two fastest ways of sharing data between Perl processes on a single machine. This may have changed (SQLite version 3 is supposed to be much improved). However I wouldn't recommend playing around with caching and sharing data for performance increases until you've done the obvious use of proxies for http acceleration. Aside from raw performance, two other problems with shared memory are that locking hotspots in RAM can become a bottleneck on your webserver, and you make it harder to later choose to load-balance several machines. Trust me, if volume continues to increase, you really want to be able to load-balance several machines.	[reply]
Re^2: CgiProxy And Heavy Visitors by Anonymous Monk on Apr 10, 2005 at 09:08 UTC
So 1-i should use mod_perl 2-install Apache2 + prefork MPM 3-setup squid cache server 4-add more ram 5-setup load balancing right ?	[reply]
Re^3: CgiProxy And Heavy Visitors by tilly (Archbishop) on Apr 10, 2005 at 13:14 UTC
There is no need to go to mod_perl 2. The rest is right. Step 5 is generally good advice for failover, but depending on your situation the previous steps may be enough. If they are not then you probably have a memory leak somewhere, and it would behoove you to track it down. (It is probably also a good idea to configure Apache so that kids will kill themselves after they get too big.)	[reply]
Re^3: CgiProxy And Heavy Visitors by Anonymous Monk on Apr 10, 2005 at 11:06 UTC
now i orderd 3gb extra ram so 4gb will be enough ? right ? ;)	[reply]
Re: CgiProxy And Heavy Visitors by jbrugger (Parson) on Apr 10, 2005 at 07:27 UTC
Try preloading your needed modules in your startup.pl (using mod_perl). These loaded modules are shared now for all apache processes. You should use Apache2 prefork MPM however, as i found out here: (shared) memory and preloading modules using Mod-perl, and don't have to be reloaded. Next you can change the apache config, and set the MaxRequestsPerChild to a lower number, so if there are little leaks in a script (i hope not ;-)) the child process is stopped, and memory is cleaned. next, indeed mod_perl uses more memory, it keeps compiled code in memory, that's what makes it fast. "We all agree on the necessity of compromise. We just can't agree on when it's necessary to compromise." - Larry Wall.	[reply]
Re: CgiProxy And Heavy Visitors by jhourcle (Prior) on Apr 10, 2005 at 12:47 UTC
If the load is high under mod_cgi, and memory intensive under mod_perl, you might be able to get more towards the middle of the ground by reducing the number of servers forked. (I know this is counterintuitive -- but by reducing the total number of child processes, you can reduce the amount of memory used. If you can keep the memory from getting to the point where you're paging, you might be able to actually improve the overall performance.) Another option is to try to find which process is the memory hog. Because of the way that Perl doesn't give memory back to the OS, once it's needed by the server, it'll never get cleaned up. Because of what you said about the performance under mod_cgi, I would assume it's not a process that people call all that often. (well, that often as a percentage of the total hits). If you can figure out which script it is, you can either refactor it so it's less memory intensive, or move it so it's being run under mod_cgi. And well, the option to throw memory at the problem isn't all that bad. It'll fix the symptoms, even if it doesn't fix the underlying problem. Memory can go for about $100-150/GB these days, depending on the exact speed/type -- It might take you a day or more to track down the problem and refactor it, which would cost the company more than the cost of the memory. However, it's only in the short term. If you can 'fix' the problem, and get the server so it's running at 25% of its capacity, you've just increased the value of the time to 75% of the value of the server. (and of course, there's the issue with potential lost business or lost value of brand reputation if the problem drags on for too long) I personally would probably get the memory, but try to track down the problem so I know what was causing the problem, and trying to keep it from happening again in the future. I don't know what your budget looks like, or what the political situation is in your company, so this might not be the best option in your situation. Unfortunately, server tuning has a whole lot of variables involved, so it's possible that there might be any number of things that you can do to improve your performance. (I don't even know if you're making calls to other systems, or databases, or if there are other competing processes running)	[reply]
Re^2: CgiProxy And Heavy Visitors by Anonymous Monk on Apr 10, 2005 at 13:24 UTC
thanks ! with this server this folowing was my max records in apache : mod_perl 30 request per second + automaticly restart of apache each 10 minutes ! becouze of low memory ! mod_cgi 45 request per second ! its very hard and need high money for load balancing ! i requested my Datacenter to add 3gb extra memory so 1gb+3gb=4gb is 4gb enough ?! or i am wrong ? in mod_perl i just had memory problem and server load was under 0.50 i am running just cgiproxy on this server and nothing	[reply]
Re^3: CgiProxy And Heavy Visitors by jhourcle (Prior) on Apr 10, 2005 at 21:17 UTC
You haven't been very specific on exactly how many different scripts you're running, or what those scripts are doing that might require them to use so much memory. That information could result in a significantly different recommendation. Without knowing just what's going on, I have no idea if 4GB is enough ... or what even qualifies as 'enough'. Making sure it's not going to be paging under its current load is one thing, but I'm guessing your company won't be happy if you creep back up over the line before the year's out. (and I'm guessing you will, with any significant growth, given that you're restarting mod_perl every 10 minutes.) I'd suggest that you do more than just throw memory at the problem, and try to determine what the real problem is. I remember once being rushed to the emergency room, and after many hours there, being released and told to take a pain killer, because they had no idea what was wrong. (it took another 18 months and 3 different doctors, after a whole bunch of other medical problems before I figured out on my own that I had become lactose intolerent). If I had listened to the 'professionals', who had a vested interest in my coming back more times and giving blood samples and the like, (as they'd stop getting money if I didn't keep showing up), I'd probably be miserable to this day, as it felt like my innards were ripping itself apart, and couldn't travel without... let's just say 'many unpleasent problems', that we'll compare to your restarting every 10 minutes. No one has been given enough information to make an authoritative recommendation to solve your problem with anything more than a band-aid to stabalize it so that you can do some more in-depth analysis to find the real problem. Writing software can be an interative process, and as you find problems, they can be fixed, or as the parameters change (in your case, the number of hits per minute), the program may have to be changed to deal with conditions that it wasn't expecting when initially written. The problem with mod_perl is that the program is left running through many, many callings. If you put a program with a memory leak under mod_perl, it will leak with every calling, until it occupies all of the physical memory on the system, and starts paging. It's also possible that you just a lot of different programs on your system, and they call many varied modules, and some may use a fair amount of memory, so that as the apache child processes run each of the scripts, they use more memory (not necessarily a leak, just not a candidate to be using mod_perl) Without having seen your configuration, or knowing anything other than what you have said, I will state that I do not believe that placing more memory in the server will be a long term fix. Just like putting more oil into a car that's leaking only helps for so long, you're very likely to run into this same problem in the future. I bitched at BrainBench for their 'Webserver Administrator' test, because they had questions where they gave limited details and asked 'what is sure to fix the problem', and gave solutions like 'adding more memory', 'use RAID', and the like ... this is a similar situation. There is no silver bullet fix to every situation, and there are too many unknowns to assure you that any given solution will fix the problem.	[reply]
Re^3: CgiProxy And Heavy Visitors by tilly (Archbishop) on Apr 10, 2005 at 22:42 UTC
The load refers to how many processes are ready to run on average when the scheduler has to pick someone. If the system is on its knees and load is low, then you have a bottleneck elsewhere. It seems likely that your system is heavily in swap, but it is also possible that you are doing something else silly like disconnecting from the database with every request. (That particular silliness should not bring you below the performance of mod_cgi, and you show other signs of memory problems.) I would still go with my previous recommend as an obvious first step. After you do that you should be able to drop the number of mod_perl Apache kids substantially and maintain throughput. It would also be good longer term to audit the code looking for where memory is likely to be allocated and not freed at the end of each request. Devel::Leak may help you do that. I'm not entirely sure of what you mean by "automatically restart of apache every 10 minutes". If you mean that Apache has to stop and launch again, that suggests that your machine is getting out of memory and random processes are being killed. This is very bad. If you mean that Apache children are dying that frequently, that is fairly harmless. Think of it this way, if your average Apache child lasts 10 minutes and you have 20 of them, then you have one child being launched every 30 seconds. Compare with 45 being launched per second and you see improvement. The time to be able to launch processes is not going to be an issue until they die a lot faster than that! Incidentally once you are on mod_perl, writing real mod_perl handlers rather than using Apache::Registry should both considerably boost performance and save memory. (I don't know that you're using that, but it is a pretty safe guess.)	[reply]
Re^4: CgiProxy And Heavy Visitors by artin (Initiate) on Apr 13, 2005 at 10:02 UTC
Re^5: CgiProxy And Heavy Visitors by tilly (Archbishop) on Apr 14, 2005 at 02:21 UTC
Re^4: CgiProxy And Heavy Visitors by artin (Initiate) on Apr 14, 2005 at 09:11 UTC