tonyb48 has asked for the wisdom of the Perl Monks concerning the following question:

I created a website for a high school class to share info & plan reunions: cfhs69.com It worked fine a few months ago. I have made no recent code changes. But 2 weeks ago my users began to get errors. In my error log, it said "Too Many Open Files". The failure does not happen consistently. A function which causes it to happen now, might not cause the failure when run a few minutes from now. Although my hosting service is very cooperative, they cannot figure out what to do. Is there any experiment I can run to help me/them figure out the problem? Thanks, Tony
  • Comment on How to debug "Too Many Open Files" Error

Replies are listed 'Best First'.
Re: How to debug "Too Many Open Files" Error
by moritz (Cardinal) on Sep 16, 2008 at 19:32 UTC
    The unix command lsof can be used to determine the opened files, maybe that helps you.

    Maybe you could also check your code for pieces like this:

    open HANDLE, $filename;

    Where a glob (HANDLE) is used as a file handle, instead of a lexical variable:

    open my $handle, $filename;

    With the latter the files are automatically closed when $handle goes out of scope, so a forgotten close does much less harm.

      Maybe you could also check your code for pieces like this: open HANDLE, $filename;

      On my system, if you re-use a glob handle, the old one gets closed. It would take thousands of hard-coded lines like that to be causing this problem.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        Or maybe generated code in string evals. (Not very likely, but still possible)
      Thanks so much for your quick response. This is good practice so I appreciate the recommendation. In this application I have had no cause to open any files directly, although I believe Authen::Captcha probably opens some and I use this to generate a session key when a user signs in. Some of the other modules I use may open files as well. Here is the only case I use an OPEN command:
      open ( MAIL, "|$sendmail -oi -t" ) or die "Cannot open $sendmail: $!";
      But this mail module has not been involved with any of the errors that have emerged. Is there a way to run lsof from inside my PERL script? If not, I can suggest it to my hosting service. Thanks, Tony
        You can run external commands from perl (not PERL please, it's not an abbreviation) with either system or qx (the latter might be more useful in your case).

        It is very unlikely that this line is causing the issue because normally opening a pipe does not count towards open files.

Re: How to debug "Too Many Open Files" Error
by tilly (Archbishop) on Sep 16, 2008 at 21:51 UTC
    The problem is that there are too many files open. Without knowing the exact message and what messages are in your system I can't tell you whether the limit that is being exceeded is the number your process may have open, or the whole system. At a guess if it doesn't say anything about the system then it is your process. But to get a better sense you should look through the possible system messages to figure out which error message you're getting. This line of code will give you those messages:
    perl -le 'print ($! = ++$_) until $! = ~/^Unknown/'
    On my system two of the messages are:
    Too many open files in system Too many open files
    The first message would indicate a global problem on my machine, the second one that is specific to my process.

    If it is global, you need to look at what is happening on the machine as a whole. It may not be easily fixable (though in Linux you can get them to up the limits, as others have told you.) If it is local to your process then the change in behaviour is driven by data that you have. Somehow you're opening files and not closing them. Without looking at your code I can't tell you where that is happening. However what you need to do is close some of your open files.

    You can do this explicitly, or by using the built-in FileCache to close and reopen filehandles behind the scenes. Do note that it has a bug where it does not reset the position of filehandles if it opens and closes them. If that matters to you then I would create a tied class that can open and close files like FileCache does but also keeps track of positions.

    If the problem is global it might still be the fault of your process. Aggressively avoiding multiple files open is not a bad idea. But the odds are that you aren't the culprit of the system problem. Instead it has to do with system load and, somewhere, some process is using lots more filehandles than it needs.

      Thanks for the very thorough answer. Here is a typical extract from my error log:
      [Tue Sep 16 06:47:33 2008] [error] [client 65.5.128.20] (24)Too many o +pen files: couldn't spawn child process: /var/www/vhosts/cfhs69.com/c +gi-bin/alumni001A.pl
      Based on your comment, this would imply that there are too many open files in my process, not in the system. The difficulty for me is that I do not open files directly. I have one OPEN command that is for MAIL, and this module has not been involved in the failures. I use some other PERL modules that may open files, such as Authen::Captcha, but I assume they are well behaved. And I only use this once per user. And I only have 88 users and I would be surprised if there is ever more than 2 using the site at once. Tony
        The full error provides us some extremely useful context. Something cannot run the command /var/www/vhosts/cfhs69.com/cgi-bin/alumni001A.pl. Glancing at your website I'm going to bet that that command is what is run by Apache to serve the page http://www.cfhs69.com/cgi-bin/alumni001A.pl. Which tells me that the error message is almost certainly coming from Apache, and not your code.

        Googling around I am willing to bet that it has to do with the problem explained in http://httpd.apache.org/docs/1.3/misc/FAQ.html#fdlim. Namely that your hosting provider is trying to run too many virtual hosts out of Apache, with the result that sometimes you get an Apache process that has opened up too many files, connections, etc and can't open another. (They may be using Apache 2, but the same issue still exists because it is an operating system limit, not an Apache limit.)

        In which case the possible solutions are that they can raise the operating system limit as is documented there, they can run fewer virtual hosts per server, or you can find a more competent hosting provider.

        PS: Please call the language Perl, not PERL. See the FAQ for verification. Calling it by the wrong name grates on people who are competent with the language. So unless you actively want to irritate people here, you really should call it by the right name.

        Are you sure the alumninnnX.pl progs are exiting properly and not hanging?
Re: How to debug "Too Many Open Files" Error
by betterworld (Curate) on Sep 16, 2008 at 19:21 UTC

    It might help to know what technologies you use. Although most hosting providers only provide CGI as a means to run Perl scripts, this error message sounds more like it could happen in a long-running service like mod_perl or FastCGI.

    Anyway, if the server runs Linux, the result of `ls -l /proc/$$/fd/` could be interesting.

      Thanks so much for your quick response. I'm not familiar with the use of "technologies" as you have used it. I have the following "use" statements:
      use strict; use CGI qw(:standard escape escapeHTML); use Authen::Captcha; use Image::Magick; use Data::FormValidator (qw/filter_ucfirst/); use Data::FormValidator (qw/valid_email/); use CGI ':cgi-lib'; use Regexp::Common (qw/profanity/); use lib qw(/var/www/vhosts/cfhs69.com/cgi-bin); use CGI::Carp qw(fatalsToBrowser);
      Is there any way I can run the command you show from inside my script? If not, I can ask my hosting service to run it. Needless to say, they do not invite me to run commands directly on their server. Tony
        Is there any way I can run the command you show from inside my script? If not, I can ask my hosting service to run it. Needless to say, they do not invite me to run commands directly on their server.

        It's important that this command be run by your script a short time before it crashes. Otherwise the output might not be helpful.

        As you have not posted code, I assume that you don't know where the error happens. This makes it difficult to run the command shortly before that, so I suggest putting something like

        warn `ls -l /proc/$$/fd/` . ' ';

        in various places in your code; you will probably see the output in your server log.

        The appended whitespace ' ' ensures that the string does not end in a newline. warn() and die() append the source code line number in that case.

      The bug came from that you open files without closing it after using.
Re: How to debug "Too Many Open Files" Error
by ruzam (Curate) on Sep 17, 2008 at 02:21 UTC
    I've run in to problems with Image::Magick before (Image::Magick Exception 430).

    It doesn't release file handles used to read/write image files. I had a script running against directories of (coincidentally) highschool photos that turned them into various sized thumbnails. Despite every thing I could think of to 'close' Image::Magick between images, after about 2048 or so (what ever the internal system limit for open file handles) image manipulations, the script died with too many open files.

    If your script is being run persistently, and it's using Image::Magick to read and save images, I can see how it might run out of file handles over time. I suspect the persistent process resets when it dies on the error, and then you get your random wait all over again for it to happen next.

    If you give more information on how you're using Image::Magick, it could discounted (or focused on) as your problem. It may be as simple (or as difficult) as an Image::Magick update.
Re: How to debug "Too Many Open Files" Error
by eighty-one (Curate) on Sep 16, 2008 at 20:05 UTC
    Is there any particular action or section of the site that seems most error-prone? Also, is there a particular time of day or day of the week that you've noticed these happening? Do you have a way to check usage and/or load to see if there's any correlation with the error occurring v. the number of users?

    Would modifying the code to write to a log or send an email be a possibility? You may not notice a helpful pattern just based on user reports (as not every user will report every error) but a log of every occurrence might make the cause and/or solution more obvious.
      Thanks for your thoughts on this problem. I have noticed no regular pattern to the problem. Although I had nearly no failures at nights and weekends. However, the host talked about migrating my site(s) to a new server with less traffic. So his work might have improved things as well. I have no way of checking for load on the server. I am quite sure the load on my ap is very low; I suspect there are never as many as 3 users. Within this thread I got a couple of suggestions to give to the host to check his load. Shortly, I plan to insert the following code into the 3 most popular modules in the application, based on suggestions in the thread above:
      warn `ls -l /proc/$$/fd/` . ' ';
      However, if I were a hosting service, I would not allow commands like this to run; so I wonder if it will be blocked. If so, I will ask the host to run it; but that is not as useful, because one would like to run it right before my app crashes. Tony
Re: How to debug "Too Many Open Files" Error
by mr_mischief (Monsignor) on Sep 17, 2008 at 13:46 UTC
    There are a number of ways that the number of open files is limited.

    The perl process may have hit a process limit for open files. The web server may have hit a process limit for open files if you're running inside the web server's process. The user that runs the perl system may have hit a per-user open files limit. Some operating systems have a limit on the number of open files system wide, but I'm not sure any modern ones do.

    The command ulimit -a will tell someone all about the soft user limits set on processes spawned under a particular shell, which is usually a user's login shell, under Bash or ksh on a Unixy OS. ulimit -a -H will show the hard limits. ulimit -n shows specifically the number of open file handles allowed per process and can be used with -H as well to see the hard limit for that.

      Thanks very much for your advice. I assume the commands you suggest are unix commands. I will request my hosting service to run them. Tony
        Yes, you assumed correctly. I'm not sure why you needed to assume since first sentence in the paragraph containing the commands contains the words "on a Unixy OS", but you did assume correctly.
Re: How to debug "Too Many Open Files" Error
by Lawliet (Curate) on Sep 16, 2008 at 19:37 UTC

    From some searching around the interweb, may I suggest you log into your web server and type echo 8192 > /proc/sys/fs/file-maxecho && 32768 > /proc/sys/fs/inode-max into the terminal?

    That is, of course, assuming your hosting company uses linux. And that you have full access to the box.

    Take the suggestion with a grain of sugar - it is hard to deduce the problem without more information.

    Update: Take the suggestion even less than a grain of sugar (see moritz's post in reply to this one)

    I'm so adjective, I verb nouns!

    chomp; # nom nom nom

      I'm quite sure that no sane hosting company will offer write access to /proc/sys/fs/* (unless on a "root server"), and if it does you don't want to use that hosting company anyway.

        Haha wow - good call.

        I'm so adjective, I verb nouns!

        chomp; # nom nom nom

      Take the suggestion even less than a grain of sugar

      Actually it's a good idea to look what's in that file (reading it should be possible for ordinary users), and what "ulimit -n" prints. This should be possible from a CGI script, without ssh access.

      If the number is something like 1024, we can be almost sure that there is a programming mistake in the OP's scripts. However, if the provider chose to lower that number to something like 10 (to prevent excessive use of resources), the web software might need some restructuring to make sure that file handles (including sockets) are never open longer than necessary.

      Thanks for the response. I assume there is no way I can run this command from inside my PERL script. However, my hosting service is very responsive. I will ask them to run the code. What would they expect to see as a result? What are they likely to tell me; or whay might we do to fix the problem? Thanks Tony