Pstack has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks

I am looking for ideas to contain and manage a Segmentation Fault that arises after several iterations of a batch process which, admittedly, is huge in both Perl code and data and data structures (that may not be getting cleaned up properly when finished with). The system monitor shows 100% cpu usage and 98% memory usage after the crash.

The process suite, normally invoked for just single iterations, has given no problems before. But in a batch loop after 5 to 12 or so it runs out of puff with a Segmentation Fault. Eval does not trap the crash.

The code is far too big and complex to go into, using many many external modules (some big), such as BerkeleyDB, Spreadsheet::WriteExcel etc., and literally tons of hashes and arrays etc., and maybe 20 or so internal modules in the worst cases.

If I could somehow predict the overload (my best guess from the circumstances) in advance, I could break up the batch runs into non-arbitrary smaller chunks, or if there were a module to clean up after calling certain modules before they are re-invoked, I could make use of that perhaps? Unfortunately there is no message besides the raw "Segmentation Fault".

o.k. you probably get the idea. This is Perl 5.8.7 compiled over RedHat 8 with linux kernel 2.4.20.

Any helpful thoughts appreciated.

cheers

Pstack

Replies are listed 'Best First'.
Re: segmentation fault
by superfrink (Curate) on May 05, 2007 at 07:10 UTC
    Segmentation faults can be caused by a couple things. Many times they are caused by accessing a memory address not allocated to the process (ie a bad pointer). Generally this is not a problem for code written in perl.

    A segfault can also be caused by a process running out of stack space. This tends to show up in programs that have recursion based algorithms. I don't know for sure but I thought I once heard perl deals with lots of recursion just fine. That said perl is written in C and maybe there is a problem with a library function for example.

    Since the program runs normally on small runs but not on large runs stack overflow is my first guess. I think the stack size limit is 8 Megabytes by default on Linux. Once the stack grows to more than that a segfault occurs.

    You can try increasing the stack space using the "ulimit" command built into bash. (Or see the documentation for your shell.) For example:
    $ ulimit -s 8192 $ ulimit -s 16384 $ ulimit -s 16384
    For more information on ulimit read the man pages for "ulimit" and "bash" or run "ulimit -a".

    If you haven't tried it yet you could also try to run each job in a separate process. ie call perl once for each job using a loop rather than have one execution of perl run all of the jobs. I don't know how to predict how many jobs could run in a single process before it gets to be too many.

    Segfaults can also be caused by flaky hardware. It is also possible to intentionally cause a segfault by sending a signal 11 with the "kill" system call.

    Update: eval will not catch a segfault. A segfault causes Linux to kill the process and create a core file (if ulimit is set to allow core files). See the man page for "signal" in section 7 of the man pages (not section 2) by running "man 7 signal" for descriptions of what the system does for different signals. A segfault is signal number 11 and is also known as a segmentation violation.
      Some good thoughts there to work on, thanks.

      >> ..try to run each job in a separate process..?

      Do you mean with a system()'d or fork()+exec()'d call through a new system shell to a fresh child perl each time from a parent perl script? if not, I'm at a bit of a loss to grasp how this might be arranged?

      I agree that stack overflow seems the most likely culprit and I wonder then if you think my "threaded" perl installation could be somehow contributing? While this application makes no explicit use of threads maybe the interpreter does?

      regards

      Pstack

      UPDATE:

      A higher ulimit -s did not itself fix the problem but helped track it down, as a bigger stack size would allow more successful runs before the segfault. What was incrementally clogging up the bash stack appears to be runaway Berkeley DB cursor handles not properly closed out. I suppose Perl would not free them merely via "going out of scope" since they belong to the underlying C suite. Thought you might like to know.

      btw running so close to the default it seems a pity not to be able to up that bash shell stack limit from within Perl just for this routine (ie not altering the default itself)? My attempts anyway ran into a Wall.

      cheers

      Pstack

        btw am making some progress via ulimit resets, getting feedback etc.
Re: segmentation fault
by RL (Monk) on May 05, 2007 at 14:01 UTC

    Perhaps you can reproduce a batch run which will lead to a segmentation fault.

    Then let your script start a strace on its own pid and write output to whatever logfile

    exec "strace -p$$ >/whatever/logfile";

    Don't know if it works. Just an idea.

    RL

      And another good idea too, thanks. I have no experience with strace either, but might be worth a try. Cheers.

      Pstack