Sandy has asked for the wisdom of the Perl Monks concerning the following question:

I have a reasonable large program that queries more than 200,000 records from a database, manipulates the results, and prints the output to a flat file. (over 50 Meg in size).

Everything was fine until I was asked to place this program onto a less powerful machine as a 'backup'. When I ran the program, Solaris complained that I had run out of memory.

So, I read, and I experimented, and read some more, and so forth. I reduced the memory consumption enough that the program still runs, added more swap space, but it still wastes a lot of time 'swapping' memory in and out. (on a sufficiently robust system it takes less than 20 minutes to run, on this workstation, it takes more than 2 1/2 hours to run)

In the process of learning, I found a system memory hog that I don't quite know how to deal with.

Here is a little program that illustrates my problem. The my $str is within the for loop to mimic my original program. I know that in this context it looks silly. During execution I monitor the system resources using vmstat 1 in another window. I used sleep to slow the process down so that I could monitor the memory usage more easily.

#!/usr/bin/perl use strict; # Loop # 1 my @outfiles = qw(d.d e.e f.f); foreach my $file (@outfiles){ print "\$file = $file\n"; open OUTFILE, ">","$file" or die "Ooops\n"; for my $i (1..50000) { if (int($i/1000)*1000 == $i) { sleep(1); } my $str = " "x300 ."\n"; print OUTFILE $str; } close OUTFILE; } @outfiles = qw(d.d d.d d.d); open OUTFILE, ">","d.d" or die "Oops\n"; close OUTFILE; # Loop # 2 foreach my $file (@outfiles){ print "\$file = $file\n"; open OUTFILE, ">>","$file" or die "Ooops\n"; for my $i (1..50000) { if (int($i/1000)*1000 == $i) { sleep(1); } my $str = " "x300 ."\n"; print OUTFILE $str; } close OUTFILE; }
In the 1st major loop, the available memory goes down by 15Meg for each file, but recovers this memory when the file is closed. In the 2nd major loop, the available memory goes down a total of 45Meg, without recovering the memory when the file is closed and re-opened in 'append' mode.

The graph of the free memory looks (roughly) like this:

420000 | | + + + + | + + + + + + + 410000 | + + + + + + + | + + + + + + + | + + + + + + + 400000 | + + + + | + | + 390000 | + | + | + 380000 | + | + | + 370000 | +
My original program requires that the file be written, (15 other programs read in that file, and I do not have the luxury of rewriting these programs) so I cannot go to a database to store this data.

I could write the file in separate 'chunks' and then cat them altogether at the end, but this seems kinda clunky and inelegant.

Does anybody have any ideas?

I've tried flushing ($|++) and format/write. Neither helped.

Thanks
Sandy.

PS: I am running Solaris 2.8 (and rehosting onto Solaris 2.7). I am using perl 5.8.2

UPDATE:

I found some more information about the problem, thanks in part to browseruk who gave me the key words to google for (file caching solaris).

Under Solaris, to minimize I/O time, it sucks the entire open file into memory. I tested my program on the machine that had sufficient memory, so without swapping, the memory consumption now seems logical. (The test files were 15Meg, and the memory consumption was 15Meg). Also, when reopening a file for appending, the initial file will be loaded into memory. This is why closing and reopening a file in append mode did nothing.

Solaris 2.8 behaves nicely when swapping. (My test was run on Solaris 2.8). It does not swap out processes if it runs out of memory for I/O. Solaris 2.7, however, is not so nice. It does swap out processes if it needs more memory for I/O. This can cause the machine to spend more time swapping then working. Because my Solaris 2.7 workstation is low on memory, it can't hold the entire file in memory, so it starts swapping. Hence my original 20min program took 2 1/2 hours on the workstation.

There is a solution, it's called priority_paging, and supposely fixes the problem (haven't tried it yet).

Reference:

http://www.princeton.edu/~psg/unix/Solaris/troubleshoot/ram.html,

http://sunsolve.sun.com/pub-cgi/show.pl?target=content/content8 and

http://www.sun.com/sun-on-net/performance/priority_paging.html

Sandy

Replies are listed 'Best First'.
Re: Managing System Memory Resources
by McMahon (Chaplain) on Jun 25, 2004 at 20:55 UTC
    Using vmstat 1, I get the same free memory graph you report when I run your program on FreeBSD 4.9, but with different scale.

    freebsd.org has a pretty good set of documentation. You might be able to find something about your problem there, since the behavior on Solaris and FreeBSD 4.9 seems to be nearly identical.
Re: Managing System Memory Resources
by BrowserUk (Patriarch) on Jun 25, 2004 at 20:27 UTC

    I tried your code (on win32) and I see zero memory growth beyond the initial program load with either version, which is pretty much what I expected. The entire program (both loops) never used more that 1.5 MB of ram!

    That suggests (perl build differences aside) that the problem lies with the platform rather than perl itself.

    Given that you are writing to the file(s) line by line, it would appear as if you OS is caching the entire file in memory.

    I know nowt about Solaris, but maybe there is a configuration option, that tells the OS to cache entire files, that can be turned off?


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
    "Memory, processor, disk in that order on the hardware side. Algorithm, algoritm, algorithm on the code side." - tachyon
Re: Managing System Memory Resources
by dws (Chancellor) on Jun 26, 2004 at 06:22 UTC

    I could write the file in separate 'chunks' and then cat them altogether at the end, but this seems kinda clunky and inelegant.

    I hear:

    But [for unstated reasons and beliefs] this seems [to me to be] kinda clunky and inelegant.

    Take a stab at filling in that first missing chunk, and maybe we can get somewhere with the underlying issue. Why does writing smaller chunks and concatenating them later seem to be inelegant?

      If it is the OS caching the entire file in memory, then isn't writing chunks then concatenating them simply going to defer the memory consumption problem to the program doing the concatenation?


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "Think for yourself!" - Abigail
      "Memory, processor, disk in that order on the hardware side. Algorithm, algoritm, algorithm on the code side." - tachyon

        ... isn't writing chunks then concatenating them simply going to defer the memory consumption problem to the program doing the concatenation?

        Nope. Concatenating files takes very little application memory unless you're trying to optimize disk head movement. Even then, the OS generally does a better job. Think

        while ( <IN> ) { print OUT $_; }
        but in slighly larger chunks.

      The reason I find creating files in 'chunks' and then later concatenating them 'clunky' is more that I now have to manage, within the subroutine that does all the db queries, a test of the size of the current 'temp' file, etc.

      Certainly do-able, but I was hoping for less.

      However, I did find some more information about the problem, thanks in part to browseruk who gave me the key words to google for (file caching solaris).

      (I will update my OP and repeat the following info)

      Under Solaris, to minimize I/O time, it sucks the entire open file into memory. I tested my program on the machine that had sufficient memory, so without swapping, the memory consumption now seems logical. (The test files were 15Meg, and the memory consumption was 15Meg). Also, when reopening a file for appending, the initial file will be loaded into memory. This is why closing and reopening a file in append mode did nothing.

      Solaris 2.8 behaves nicely when swapping. (My test was run on Solaris 2.8). It does not swap out processes if it runs out of memory for I/O. Solaris 2.7, however, is not so nice. It does swap out processes if it needs more memory for I/O. This can cause the machine to spend more time swapping then working. Because my Solaris 2.7 workstation is low on memory, it can't hold the entire file in memory, so it starts swapping. Hence my original 20min program took 2 1/2 hours on the workstation.

      There is a solution, it's called priority_paging, and supposely fixes the problem (haven't tried it yet).

      Reference: http://www.princeton.edu/~psg/unix/Solaris/troubleshoot/ram.html, http://sunsolve.sun.com/pub-cgi/show.pl?target=content/content8 and http://www.sun.com/sun-on-net/performance/priority_paging.html

      Sandy

        I love it when a guess comes together:)


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "Think for yourself!" - Abigail
        "Memory, processor, disk in that order on the hardware side. Algorithm, algoritm, algorithm on the code side." - tachyon

        Solaris 2.8 behaves nicely when swapping. ... It does not swap out processes if it runs out of memory for I/O. Solaris 2.7, however, is not so nice.

        One approach to consider, if you're willing to support multiple strategies, is to make the decision to do file processing in-memory dependent on OS version.