in reply to How to debug "Too Many Open Files" Error

The problem is that there are too many files open. Without knowing the exact message and what messages are in your system I can't tell you whether the limit that is being exceeded is the number your process may have open, or the whole system. At a guess if it doesn't say anything about the system then it is your process. But to get a better sense you should look through the possible system messages to figure out which error message you're getting. This line of code will give you those messages:
perl -le 'print ($! = ++$_) until $! = ~/^Unknown/'
On my system two of the messages are:
Too many open files in system Too many open files
The first message would indicate a global problem on my machine, the second one that is specific to my process.

If it is global, you need to look at what is happening on the machine as a whole. It may not be easily fixable (though in Linux you can get them to up the limits, as others have told you.) If it is local to your process then the change in behaviour is driven by data that you have. Somehow you're opening files and not closing them. Without looking at your code I can't tell you where that is happening. However what you need to do is close some of your open files.

You can do this explicitly, or by using the built-in FileCache to close and reopen filehandles behind the scenes. Do note that it has a bug where it does not reset the position of filehandles if it opens and closes them. If that matters to you then I would create a tied class that can open and close files like FileCache does but also keeps track of positions.

If the problem is global it might still be the fault of your process. Aggressively avoiding multiple files open is not a bad idea. But the odds are that you aren't the culprit of the system problem. Instead it has to do with system load and, somewhere, some process is using lots more filehandles than it needs.

Replies are listed 'Best First'.
Re^2: How to debug "Too Many Open Files" Error
by tonyb48 (Novice) on Sep 16, 2008 at 23:30 UTC
    Thanks for the very thorough answer. Here is a typical extract from my error log:
    [Tue Sep 16 06:47:33 2008] [error] [client 65.5.128.20] (24)Too many o +pen files: couldn't spawn child process: /var/www/vhosts/cfhs69.com/c +gi-bin/alumni001A.pl
    Based on your comment, this would imply that there are too many open files in my process, not in the system. The difficulty for me is that I do not open files directly. I have one OPEN command that is for MAIL, and this module has not been involved in the failures. I use some other PERL modules that may open files, such as Authen::Captcha, but I assume they are well behaved. And I only use this once per user. And I only have 88 users and I would be surprised if there is ever more than 2 using the site at once. Tony
      The full error provides us some extremely useful context. Something cannot run the command /var/www/vhosts/cfhs69.com/cgi-bin/alumni001A.pl. Glancing at your website I'm going to bet that that command is what is run by Apache to serve the page http://www.cfhs69.com/cgi-bin/alumni001A.pl. Which tells me that the error message is almost certainly coming from Apache, and not your code.

      Googling around I am willing to bet that it has to do with the problem explained in http://httpd.apache.org/docs/1.3/misc/FAQ.html#fdlim. Namely that your hosting provider is trying to run too many virtual hosts out of Apache, with the result that sometimes you get an Apache process that has opened up too many files, connections, etc and can't open another. (They may be using Apache 2, but the same issue still exists because it is an operating system limit, not an Apache limit.)

      In which case the possible solutions are that they can raise the operating system limit as is documented there, they can run fewer virtual hosts per server, or you can find a more competent hosting provider.

      PS: Please call the language Perl, not PERL. See the FAQ for verification. Calling it by the wrong name grates on people who are competent with the language. So unless you actively want to irritate people here, you really should call it by the right name.

        Just to add to the possible solutions bit. Another popular solution is to configure Apache to write to its access and error logs via pipes.

        Thanks very much for this answer. It will help me have a more constructive dialog with my hosting service. They offered to migrate my sites (I have a second similar site for a different alumni body, which is having the same problems) to a different server with less traffic. I will refer them to the article you referenced. I will ask them to run the commands:
        echo 8192 > /proc/sys/fs/file-maxecho && 32768 > /proc/sys/fs/inode-ma +x
        ulimit -a
        ulimit -n
        ulimit -H
        I would also like to try inserting
        warn `ls -l /proc/$$/fd/` . ' ';
        into the 3 most popular modules in the script. Finally, thanks for the suggestions on not capitalizing Perl. I had no idea. The last thing I want to do is offend the very people from whom I need help.
      Are you sure the alumninnnX.pl progs are exiting properly and not hanging?

        That's a good call since you appear to be using CGI. Run top or ps and look for many processes of the same name, usually http or perl or the name of one of your scripts. Moreover if they are not exiting properly they will be marked as defunct.