Re^2: Apache / 30,000 Perl script serving limit

Replies are listed 'Best First'.
Re^3: Apache / 30,000 Perl script serving limit by almut (Canon) on May 06, 2009 at 15:32 UTC
A quick googling found this for me. I haven't investiged this any further yet, but it seems to hint at the "maxuprc" kernel parameter, or some related setting. Check with `/usr/sbin/kmtune` what your settings are...	[reply] [d/l]
Re^4: Apache / 30,000 Perl script serving limit by QcMonk (Novice) on May 06, 2009 at 17:12 UTC
Almut, I must thank you for your efforts, It has put me into a direction of investigation I had not yet discovered. I have found our maxuprc parameter was set to 256 and we have increased it to 1024. Re-running the test script (another round of 30k perl script requests) and it started giving the 500 Internal server error at the same iteration (30 thousandth). Monitoring the OS we found the total number of process used every each hour(at XX:00) was never exceeding 10 and represented about 0.9% usage. Anyhow, this experiments still allowed us to rule out (hopefully not mistakenly) that the maxuprc parameter has any effect with regards to this problem.	[reply]
Re^3: Apache / 30,000 Perl script serving limit by almut (Canon) on May 06, 2009 at 19:02 UTC
`(11)Resource temporarily unavailable: couldn't create child process: 11: environment.pl` That error message is produced by this snippet in `mod_cgi(d).c` of the Apache sources (in routine `run_cgi_child()`): `rc = ap_os_create_privileged_process(r, procnew, command, argv, en +v, procattr, p); if (rc != APR_SUCCESS) { /* Bad things happened. Everyone should have cleaned up. / ap_log_rerror(APLOG_MARK, APLOG_ERR\|APLOG_TOCLIENT, rc, r, "couldn't create child process: %d: %s", rc, apr_filename_of_pathname(r->filename)); }` [download] If you dig a bit deeper, you'll find that eventually `fork()` is being called (unsurprisingly), i.e. `if ((new->pid = fork()) < 0) { return errno; }` [download] in `./srclib/apr/threadproc/unix/proc.c`, in the routine `apr_proc_create()`. The `errno` eventually ends up in `rc`, which is being reported as 11 (numerically), or as "Resource temporarily unavailable" (text form). The corresponding symbolic form is EAGAIN, which - if you look in HP-UX's fork(2) manpage - is being returned under these two circumstances: `ERRORS If fork() fails, errno is set to one of the following values. [EAGAIN] The system-imposed limit on the total number + of processes under execution would be exceeded. [EAGAIN] The system-imposed limit on the total number + of processes under execution by a single user w +ould be exceeded.` [download] The former limit is the "nproc" setting (unlikely to be the cause here, as you can still run other programs); the latter limit is the already mentioned "maxuprc" tunable. In other words, I'd say the theory fits too well to be ruled out completely, yet... :) Monitoring the OS we found the total number of process used every each hour(at XX:00) was never exceeding 10 and represented about 0.9% usage.* How exactly did you investigate this? Are you sure you don't have any zombies lingering around, or some such. What do `ps` and `top` say when the limit has been reached? Is the limit exactly 30000, or maybe 30K, with K being 1024? Do you really need to reboot the machine, or is simply restarting Apache sufficient? (use `stop`, `start` - not `restart` or `graceful` - to be sure to actually get a new process for the Apache parent)	[reply] [d/l] [select]
Re^4: Apache / 30,000 Perl script serving limit by QcMonk (Novice) on May 06, 2009 at 19:22 UTC
I can tell you are very knowledgable in these things, Almut. I will try to provide awnsers to these questions as best as I can. For the number of process used each hour, the information has been given to me by a system Admin. We used HP-UX kcusage (watching maxuprc) and while the test was running, we watched the nproc value, and it barely increased. I shall also enquire into the info in ps and top, I have not yet done this Finally, the restart is merely graceful restart from the webmin console, and all is well thereafter for another 30,000 request exactly. To see if it changes anything, I will now proceed to Stop and Start the server instead.	[reply]
Re^5: Apache / 30,000 Perl script serving limit by almut (Canon) on May 06, 2009 at 19:41 UTC
Just to follow-up on the restarting: if "graceful" already suffices to reset the problem, using "stop" and "start" instead wouldn't contribute anything to solving the issue... — I guess I just misread your "require a reboot" to mean the machine would need to be rebooted (which would've surprised me...)	[reply]
Re^4: Apache / 30,000 Perl script serving limit by QcMonk (Novice) on May 06, 2009 at 19:57 UTC
Alright, same thing after the stop and start as you already figured. Perhaps worthy of notice is how its a new PID every so often. `www 11004 7696 2 15:46:29 ? 0:00 /usr/bin/perl /web_sites/intranet2 +/cgi-bin/environment.pl www 11067 7696 4 15:46:31 ? 0:00 /usr/bin/perl /web_sites/intranet2 +/cgi-bin/environment.pl www 11210 7696 2 15:46:37 ? 0:00 /usr/bin/perl /web_sites/intranet2 +/cgi-bin/environment.pl www 11264 7696 4 15:46:39 ? 0:00 /usr/bin/perl /web_sites/intranet2 +/cgi-bin/environment.pl` [download]	[reply] [d/l]
Re^5: Apache / 30,000 Perl script serving limit by almut (Canon) on May 06, 2009 at 20:37 UTC
What seems to be more interesting than that they do get a new PID (which is to be expected with `mod_cgid`) is that they're still running after several seconds (observe the STIME column) — presuming the output you've shown is of a single `ps` call, of course. In case `environment.pl` is just printing out the environment, this shouldn't take several seconds... so I would try to find out why that is... BTW, even on HP-UX, you can get a nice tree-like display of processes with `ps` (similarly to what option `-f` does on Linux). This is often helpful to easily see which processes forked which... For this, you'd need to enable XPG4 mode, which you do via the env variable UNIX95. This makes option `-H` available, so you can then write, for example `$ UNIX95=1 ps -efH` [download] (not sure if you're aware of it, so I thought it might be worth mentioning...)	[reply] [d/l] [select]
Re^6: Apache / 30,000 Perl script serving limit by QcMonk (Novice) on May 06, 2009 at 20:44 UTC
Re^3: Apache / 30,000 Perl script serving limit by ikegami (Patriarch) on May 06, 2009 at 18:41 UTC
A common cause is forking lots of children without reclaiming the zombies using wait or waitpid.	[reply]
Re^4: Apache / 30,000 Perl script serving limit by QcMonk (Novice) on May 06, 2009 at 18:55 UTC
I am told here that if that was the case we would see the process table line (nproc) climb like crazy, and it stayed pretty constant trhough the test.	[reply]