child process dies soon after fork() call

haidut has asked for the wisdom of the Perl Monks concerning the following question:

Hi all, I am banging my head against the wall on the following issue. I have a Perl program that spawns 4 child processes and then waitpid for them to finish their work and exit. Up until two weeks ago everything was working fine - the child processes complete their tasks fine and then return one by one, and then the parent process exits. However, in the last two weeks one of the child processes (and which one exactly seems random) dies with an "out of memory" error. The parent process is about 100MB in size (RAM) when it spawns the 4 children. Each of the four children starts at at about 500MB (due to loading a lot of data in RAM) and then each grows to about 650MB. However, about 5min after the parent spawns the 4 children AND while they are all about 500MB in size one of the child dies and the selection of the child seems to be or no particular order. The process that croaks does process some of its assigned tasks before crashing and the data processed is not really the problem. I tried executing all the tasks that are split among the 4 processes by a single process and it works just fine. So the data and its splitting is not the issue. After a lot of searching and Googling, I noticed the fact the at the time of spawning by the parent the sum of the RAM taken by the 4 child process and the parent process is about 2GB (4*500MB + 100MB), which happens to be the process limit on a 32-bit machine. Could this be the cause or is it just a coincidence? I can't help but notice that when there was less data and the children were about 300MB in size each, everything worked just fine. Now, after one of the children croaks the other 3 continue working just fine and the parent dutifully waits for all of them and then exits without error. If the issue is the parent-child relationship and size limit - is there a way around it? Would it help if the parent just exited and then the 4 children are adopted by init? Btw, the way I spawn process is just the fork() call from within Perl so nothing fancy in memory allocation going on here. Any help will be appreciated. Thx

Comment on child process dies soon after fork() call

Replies are listed 'Best First'.
Re: child process dies soon after fork() call by tilly (Archbishop) on Jan 22, 2009 at 02:47 UTC
What OS? How much RAM on the machine? Do you have swap configured? I'm going to assume a variant on Unix because one child dies and the rest continue. At a wild guess, you've got 2 GB of RAM and no swap. In which case you are running out of RAM. Adding more RAM or configuring swap should stop the problem. Having swap will slow things down a lot. Another possible fix is to split your data into a larger number of smaller pieces, then use something like Parallel::ForkManager to process it with a fixed number of children at any time. That will give you the parallelism you're looking for while controlling how much memory you need at any one time. Make the size of the target pieces be fixed. That way as your dataset continues to grow, your memory needs will stay fixed.	[reply]
Re^2: child process dies soon after fork() call by haidut (Novice) on Jan 22, 2009 at 05:29 UTC
No, RAM is not the issue. The server is a Ubuntu 8.04, has 8GB of RAM, the kernel is PAE enabled and it can see all 8GB or RAM. The swap is also not an issue - 8GB of swap. Unfortunately I can't make the children smaller. They all load a an instance of Bayesian classifier model trained on a large data set. The only real solution would be to write a server that loads that classifier, then launch several of those servers listening on different ports and then have the spawned children I mentioned earlier do some socket-level communication with servers in a round-robin fashion. So it's basically a way of offloading some of the data processing to separate instances and not inside the child processes that crash sometimes. So to reiterate, you are not aware of any restrictions on parent-child memory allocation? Nothing related to values of SHMMAX or stuff like that? The current value of SHMMAX is 32MB btw.	[reply]
Re^3: child process dies soon after fork() call by tilly (Archbishop) on Jan 22, 2009 at 05:47 UTC
I am not aware of anything like that. That isn't to say that there isn't a limit that I am not aware of though. I am neither a sysadmin nor an expert on Linux internals. (However googling for SHMMAX, that should be entirely unrelated unless you are deliberately using shared memory.) However one question that comes up is whether all of the children are loading the same instance of a Bayesian classifier model. If so then you can save on RAM by forking one child, having that one load the Bayesian classifier model, then having it fork itself into 4. That will result in the 4 children sharing a lot more memory. As they continue to work, some of that memory will come unshared, but it may save you overall. Now why are you running out of memory? I don't know. In theory you have 16 GB of RAM available to you. However it is possible that other things are using most of it, or that some sysadmin has set a ulimit on how much memory the user you're running can access. Whatever the case the behavior you describe is consistent with your running out of RAM at close to 2 GB. But that is testable. You just need to create several deliberately large processes and see where they run out of RAM.	[reply]
Re^3: child process dies soon after fork() call by roboticus (Chancellor) on Jan 22, 2009 at 09:52 UTC
haidut: <wild_guess_mode> Perhaps you're running a 32-bit version of perl? That might constrain it to a 2GB address space. </wild_guess_mode> ...roboticus	[reply]
Re: child process dies soon after fork() call by graff (Chancellor) on Jan 22, 2009 at 05:07 UTC
I would be looking for ways to make the child processes use less memory. (Are input files being slurped when they could be processed one record at a time? Is each child making unnecessary copies of its input data, e.g. by reading a whole file into a scalar then splitting into an array? Are there complex data structures where simpler storage would do? Would it make sense to use additional disk-based resources instead of in-memory data structures, e.g. dbm files or other database(-like) storage?) Failing that, I'd be checking whether it's really necessary to have four children running at once. What does that quantity get you that you don't get with two consecutive jobs with two children per job? If there is just one factor that makes the difference between "it works" and "it fails", and that one factor is the size of the input files, and it turns out that these files are just always getting bigger, you've got a scaling problem, which is a sort of design problem. Anything that doesn't solve the design problem is just going to be a stop-gap, temporary fix with a limited life-span. Solving the design problem is a matter of figuring out how to complete the task within a finite amount of ram, such that the process runs with a stable and consistent footprint no matter what size the input data may be.	[reply]
Re: child process dies soon after fork() call by Corion (Patriarch) on Jan 22, 2009 at 08:14 UTC
If the problem happens "since two weeks", then something has changed in your data "since two weeks". So, if you can, get the backups for the last four weeks and do a binary search over the data to find two (possibly adjactent) datasets, one of which "works" and one which "doesn't work". Then analyze what is different between the two datasets.	[reply]