CGI Out of Control

katzuma has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: CGI Out of Control by tachyon (Chancellor) on Feb 10, 2003 at 20:10 UTC
Apache 1.3.22 has some issues, probably unrelated but you should consider upgrading to 1.3.27. If you are letting every man and his dog write and run CGI scripts you are asking for troubles. It could be that someone has written a script like this `#!/usr/bin/perl fork() while 1 # this will fork 'em` [download] which will bring any server to its knees. Perhaps you have someone's CGI acting as a spam gateway that is being used to hammer your server. It seems like you should be looking towards instituting some decent process/memory/cpu load monitoring with a view to seeing what happens just before your server crashes. I presume you have gone as far as checking the httpd/log files and know about top? This is as good a place as any to start Here is a really basic Perl monitoring tool for you at a bargain basement price #!/usr/bin/perl my $logfile = '/var/log/top'; my $max_size = 10**6; my $max_files = 10; my $delay = 2; my $count = 0; my $num = 0; while (1) { my $time = scalar localtime; # rotate logfiles so they don't get too big if ( -e "$logfile$num.log" and -s "$logfile$num.log" > $max_size ) + { $count++; $num = $count % $max_files; unlink "$logfile$num.log" if -e "$logfile$num.log"; } my $top = `top -n1`; open LOG, ">>$logfile$num.log" or die "Can't write $logfile $!\n"; print LOG $time, "\n", $top, "\n\n"; close LOG; sleep $delay; } [download] Just background this with & and check the logs after a crash. Adjust the logfile size/number and sleep granularity to suit yourself. An average top record will be about 2K so you will fill a 1MB file roughly every 20 minutes with a 2 second granularity. 10 files lets you monitor the last 3 hours or so. You can munge the data to your hearts content. You will be interested in the last file written pre-crash which will also be the smallest as it will only be partially written. Note that the rotation is circular with old logs overwritten. You will end up with a file full of this: Mon Feb 10 14:41:55 2003 2:47pm up 160 days, 1:22, 3 users, load average: 0.40, 0.26, 0.19 28 processes: 25 sleeping, 1 running, 0 zombie, 2 stopped CPU0 states: 0.1% user, 1.0% system, 0.0% nice, 97.0% idle CPU1 states: 0.0% user, 1.0% system, 0.0% nice, 98.0% idle Mem: 1551228K av, 1539476K used, 11752K free, 0K shrd, 7982 +8K buff Swap: 1534072K av, 768592K used, 765480K free 45278 +4K cached PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND 26290 root 16 0 1248 1228 1024 S 0.9 0.0 0:00 sshd 5031 root 20 0 1016 1016 828 R 0.9 0.0 0:00 top 1 root 8 0 424 388 372 S 0.0 0.0 1:29 init 29045 root 9 0 356 304 288 S 0.0 0.0 0:22 syslogd 29134 root 9 0 376 272 236 S 0.0 0.0 0:00 sshd 29160 root 9 0 468 356 288 S 0.0 0.0 0:05 xinetd 29188 root 9 0 220 56 56 S 0.0 0.0 0:00 safe_mys +qld 29234 mysql 13 5 3908 1484 1336 S N 0.0 0.0 0:00 mysqld 29244 mysql 13 5 3908 1484 1336 S N 0.0 0.0 0:52 mysqld 29245 mysql 13 5 3908 1484 1336 S N 0.0 0.0 0:00 mysqld 29246 root 9 0 488 312 296 S 0.0 0.0 2:17 httpd 29247 mysql 13 5 3908 1484 1336 S N 0.0 0.0 0:00 mysqld 29277 root 9 0 168 128 88 S 0.0 0.0 0:07 crond 28116 apache 9 0 1092 1040 876 S 0.0 0.0 0:00 httpd 28117 apache 9 0 1084 1024 872 S 0.0 0.0 0:00 httpd 28118 apache 9 0 1092 1040 888 S 0.0 0.0 0:00 httpd [snip] [download] cheers tachyon s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print	[reply] [d/l] [select]
Re: CGI Out of Control by mowgli (Friar) on Feb 10, 2003 at 20:18 UTC
I'm not sure whether this will help, and I wouldn't yet call myself experienced either really, but how about the following? what does it mean that apache goes "temporarily down"? does it crash? lock up? does it come back up on its own, or do you have to restart it? when it is down, does it go stop responding completely, or does it just slow down or something similar? what do your logs say, especially your error log? what OS are you using? does it affect all sites you are hosting? does it affect all of the CGI programs? does it affect all of the CGI programs using DBI / DB_File::Lock? how long has this problem been persisting? have you made any software changes to the system? have you made any hardware changes to the system? can you observe any other malfunctioning on the system? did you check the scripts being used for races, potential deadlocks etc.? do you trust your users? Also, can you use one of the hosted sites for debugging, or do they all have to be up for as much time as possible? And furthermore, have you tried DB_File::Lock 0.05? I hope this helps a little bit at least. -- mowgli	[reply]
Re: CGI Out of Control by l2kashe (Deacon) on Feb 10, 2003 at 21:07 UTC
I've had similar experiences. Granted this isn't always the solution, but the times I have investigated something along these lines, I have found a customer, who has a poorly (perhaps intentionally) written formail CGI, which doesn't verify how it got the data to send, or even who its sending to. Some people come along, realize this, and just found themselves a nifty, free remailer, which will pass most spam checking due to the fact that it orginated within the network (provided it targets local recipients). The first thing I do now, in situations like your own is parse the messages file to see how much mail was pushed in a given time frame and if that correlates with my down time. Just my .02 USD /* And the Creator, against his better judgement, wrote man.c */	[reply]
Re: CGI Out of Control by jobber (Sexton) on Feb 10, 2003 at 20:40 UTC
Hello, You could always look into the setuid CGI Apache Module. This makes its alot easier to track and administrate CGI scripts since they run as the user that owns them. So you could then restrict processer time and number of forks the user could have since the scripts are no longer running as nobody. This has its down falls with with security though, since the scripts now run as a real person. Hope this helps.	[reply]
Re: CGI Out of Control by derby (Abbot) on Feb 10, 2003 at 20:11 UTC
Well ... what do the log files say? -derby	[reply]