in reply to Re: Apache / 30,000 Perl script serving limit
in thread Apache / 30,000 Perl script serving limit

Verily so,
apache-default error log contains:

[Wed May 06 10:25:40 2009] [error] (11)Resource temporarily unavailable: couldn't create child process: 11: environment.pl

the virtual server error log has:

[Wed May 06 10:25:40 2009] [error] [client 198.73.83.73] Premature end of script headers: environment.pl

I have googled "Resource temporarily unavailable: couldn't create child process" quite often and extensively, but found no one with similar problem

Replies are listed 'Best First'.
Re^3: Apache / 30,000 Perl script serving limit
by almut (Canon) on May 06, 2009 at 15:32 UTC

    A quick googling found this for me. I haven't investiged this any further yet, but it seems to hint at the "maxuprc" kernel parameter, or some related setting. Check with /usr/sbin/kmtune what your settings are...

      Almut, I must thank you for your efforts, It has put me into a direction of investigation I had not yet discovered.

      I have found our maxuprc parameter was set to 256 and we have increased it to 1024. Re-running the test script (another round of 30k perl script requests) and it started giving the 500 Internal server error at the same iteration (30 thousandth). Monitoring the OS we found the total number of process used every each hour(at XX:00) was never exceeding 10 and represented about 0.9% usage.

      Anyhow, this experiments still allowed us to rule out (hopefully not mistakenly) that the maxuprc parameter has any effect with regards to this problem.

Re^3: Apache / 30,000 Perl script serving limit
by almut (Canon) on May 06, 2009 at 19:02 UTC
    (11)Resource temporarily unavailable: couldn't create child process: 11: environment.pl

    That error message is produced by this snippet in mod_cgi(d).c of the Apache sources (in routine run_cgi_child()):

    rc = ap_os_create_privileged_process(r, procnew, command, argv, en +v, procattr, p); if (rc != APR_SUCCESS) { /* Bad things happened. Everyone should have cleaned up. */ ap_log_rerror(APLOG_MARK, APLOG_ERR|APLOG_TOCLIENT, rc, r, "couldn't create child process: %d: %s", rc, apr_filename_of_pathname(r->filename)); }

    If you dig a bit deeper, you'll find that eventually fork() is being called (unsurprisingly), i.e.

    if ((new->pid = fork()) < 0) { return errno; }

    in ./srclib/apr/threadproc/unix/proc.c, in the routine apr_proc_create().

    The errno eventually ends up in rc, which is being reported as 11 (numerically), or as "Resource temporarily unavailable" (text form). The corresponding symbolic form is EAGAIN, which - if you look in HP-UX's fork(2) manpage - is being returned under these two circumstances:

    ERRORS If fork() fails, errno is set to one of the following values. [EAGAIN] The system-imposed limit on the total number + of processes under execution would be exceeded. [EAGAIN] The system-imposed limit on the total number + of processes under execution by a single user w +ould be exceeded.

    The former limit is the "nproc" setting (unlikely to be the cause here, as you can still run other programs); the latter limit is the already mentioned "maxuprc" tunable.  In other words, I'd say the theory fits too well to be ruled out completely, yet... :)

    Monitoring the OS we found the total number of process used every each hour(at XX:00) was never exceeding 10 and represented about 0.9% usage.

    How exactly did you investigate this?  Are you sure you don't have any zombies lingering around, or some such. What do ps and top say when the limit has been reached?  Is the limit exactly 30000, or maybe 30K, with K being 1024? Do you really need to reboot the machine, or is simply restarting Apache sufficient? (use stop, start - not restart or graceful - to be sure to actually get a new process for the Apache parent)

      I can tell you are very knowledgable in these things, Almut. I will try to provide awnsers to these questions as best as I can. For the number of process used each hour, the information has been given to me by a system Admin. We used HP-UX kcusage (watching maxuprc) and while the test was running, we watched the nproc value, and it barely increased.

      I shall also enquire into the info in ps and top, I have not yet done this

      Finally, the restart is merely graceful restart from the webmin console, and all is well thereafter for another 30,000 request exactly.

      To see if it changes anything, I will now proceed to Stop and Start the server instead.

        Just to follow-up on the restarting: if "graceful" already suffices to reset the problem, using "stop" and "start" instead wouldn't contribute anything to solving the issue... — I guess I just misread your "require a reboot" to mean the machine would need to be rebooted (which would've surprised me...)

      Alright, same thing after the stop and start as you already figured.

      Perhaps worthy of notice is how its a new PID every so often.

      www 11004 7696 2 15:46:29 ? 0:00 /usr/bin/perl /web_sites/intranet2 +/cgi-bin/environment.pl www 11067 7696 4 15:46:31 ? 0:00 /usr/bin/perl /web_sites/intranet2 +/cgi-bin/environment.pl www 11210 7696 2 15:46:37 ? 0:00 /usr/bin/perl /web_sites/intranet2 +/cgi-bin/environment.pl www 11264 7696 4 15:46:39 ? 0:00 /usr/bin/perl /web_sites/intranet2 +/cgi-bin/environment.pl

        What seems to be more interesting than that they do get a new PID (which is to be expected with mod_cgid) is that they're still running after several seconds (observe the STIME column) — presuming the output you've shown is of a single ps call, of course.  In case environment.pl is just printing out the environment, this shouldn't take several seconds... so I would try to find out why that is...

        BTW, even on HP-UX, you can get a nice tree-like display of processes with ps (similarly to what option -f does on Linux). This is often helpful to easily see which processes forked which...  For this, you'd need to enable XPG4 mode, which you do via the env variable UNIX95. This makes option -H available, so you can then write, for example

        $ UNIX95=1 ps -efH

        (not sure if you're aware of it, so I thought it might be worth mentioning...)

Re^3: Apache / 30,000 Perl script serving limit
by ikegami (Patriarch) on May 06, 2009 at 18:41 UTC
    A common cause is forking lots of children without reclaiming the zombies using wait or waitpid.

      I am told here that if that was the case we would see the process table line (nproc) climb like crazy, and it stayed pretty constant trhough the test.