Apache 1.3.22 has some issues, probably unrelated but you should consider upgrading to 1.3.27.
If you are letting every man and his dog write and run CGI scripts you are asking for troubles. It could be that someone has written a script like this
#!/usr/bin/perl
fork() while 1 # this will fork 'em
which will bring any server to its knees. Perhaps you have someone's CGI acting as a spam gateway that is being used to hammer your server.
It seems like you should be looking towards instituting some decent process/memory/cpu load monitoring with a view to seeing what happens just before your server crashes. I presume you have gone as far as checking the httpd/log files and know about top?
This is as good a place as any to start
Here is a really basic Perl monitoring tool for you at a bargain basement price
#!/usr/bin/perl
my $logfile = '/var/log/top';
my $max_size = 10**6;
my $max_files = 10;
my $delay = 2;
my $count = 0;
my $num = 0;
while (1) {
my $time = scalar localtime;
# rotate logfiles so they don't get too big
if ( -e "$logfile$num.log" and -s "$logfile$num.log" > $max_size )
+ {
$count++;
$num = $count % $max_files;
unlink "$logfile$num.log" if -e "$logfile$num.log";
}
my $top = `top -n1`;
open LOG, ">>$logfile$num.log" or die "Can't write $logfile $!\n";
print LOG $time, "\n", $top, "\n\n";
close LOG;
sleep $delay;
}
Just background this with & and check the logs after a crash. Adjust the logfile size/number and sleep granularity to suit yourself. An average top record will be about 2K so you will fill a 1MB file roughly every 20 minutes with a 2 second granularity. 10 files lets you monitor the last 3 hours or so. You can munge the data to your hearts content. You will be interested in the last file written pre-crash which will also be the smallest as it will only be partially written. Note that the rotation is circular with old logs overwritten.
You will end up with a file full of this:
Mon Feb 10 14:41:55 2003
2:47pm up 160 days, 1:22, 3 users, load average: 0.40, 0.26, 0.19
28 processes: 25 sleeping, 1 running, 0 zombie, 2 stopped
CPU0 states: 0.1% user, 1.0% system, 0.0% nice, 97.0% idle
CPU1 states: 0.0% user, 1.0% system, 0.0% nice, 98.0% idle
Mem: 1551228K av, 1539476K used, 11752K free, 0K shrd, 7982
+8K buff
Swap: 1534072K av, 768592K used, 765480K free 45278
+4K cached
PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND
26290 root 16 0 1248 1228 1024 S 0.9 0.0 0:00 sshd
5031 root 20 0 1016 1016 828 R 0.9 0.0 0:00 top
1 root 8 0 424 388 372 S 0.0 0.0 1:29 init
29045 root 9 0 356 304 288 S 0.0 0.0 0:22 syslogd
29134 root 9 0 376 272 236 S 0.0 0.0 0:00 sshd
29160 root 9 0 468 356 288 S 0.0 0.0 0:05 xinetd
29188 root 9 0 220 56 56 S 0.0 0.0 0:00 safe_mys
+qld
29234 mysql 13 5 3908 1484 1336 S N 0.0 0.0 0:00 mysqld
29244 mysql 13 5 3908 1484 1336 S N 0.0 0.0 0:52 mysqld
29245 mysql 13 5 3908 1484 1336 S N 0.0 0.0 0:00 mysqld
29246 root 9 0 488 312 296 S 0.0 0.0 2:17 httpd
29247 mysql 13 5 3908 1484 1336 S N 0.0 0.0 0:00 mysqld
29277 root 9 0 168 128 88 S 0.0 0.0 0:07 crond
28116 apache 9 0 1092 1040 876 S 0.0 0.0 0:00 httpd
28117 apache 9 0 1084 1024 872 S 0.0 0.0 0:00 httpd
28118 apache 9 0 1092 1040 888 S 0.0 0.0 0:00 httpd
[snip]
cheers
tachyon
s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print
|