infidel2112 has asked for the wisdom of the Perl Monks concerning the following question:

Hi all, looking for some help :)

I have a script that has to parse and generate stats from some HUGE logfiles. It's written so that I keep bare min content of logfile in memory for least amount of time.

The problem is that garbage collection doesn't seem to happen, after the array holding the log data goes out of scope.

Specifially, I watch the mem usage grow on read by looking via top, but the mem usage never decreases once the array is out of scope, or even if i undef it in debugger. It's about 20M min of mem usage post read, so a drop should be noticeable

I'm SURE there are no references to it (I don't make any, i just count different items inside the lines), and even tried making it an object to see if garbage collection would do its thing, but that didn't help.

I'm doing this via ssh and I have also tried undefing the $ssh object after one file read, to no effect (memory usage still stays high)

I'm doing something like:

foreach my $log ( @log_names ) { my ($stdout, $stderr, $exit) = $ssh->cmd( "cat $log " ) my @logdata = split /\n/, $stdout; undef $stdout; while ( my $line = pop @logdata ) { # parse/collect stats on $line } }

I'm stumped and it's chewing up too much memory after I get through several log files, because it's never freed.

thanks for any suggestions or help!

Replies are listed 'Best First'.
•Re: garbage collection not happening?
by merlyn (Sage) on Nov 04, 2004 at 12:57 UTC
    It's a FAQ that Perl usually can't return freed memory to the OS, so you won't see any reduction in process space, but that memory is available to Perl to reuse during the current process.

    It's also a FAQ that slurping as you've done is generally memory hoggy. You might want to use an interface that lets you look at each line as it comes in, rather than waiting until all 20M have been grabbed. If you really need to do this in reverse order (with pop), you could also look at putting the information into an external data file, like DBD::SQLite or DBM::Deep.

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.

Re: garbage collection not happening?
by reneeb (Chaplain) on Nov 04, 2004 at 12:59 UTC
Re: garbage collection not happening?
by dragonchild (Archbishop) on Nov 04, 2004 at 13:46 UTC
    You need to be using Net::SFTP instead of Net::SSH::Perl. Net::SFTP has do_open(), do_read(), and do_close() methods which behave very similarly to sysopen(), sysread(), and sysclose(). Granted, you don't have the nice split-on-$/ behavior of normal filehandles, but you won't blow your memory out, either.

    Of course, you could just pull the logfile into tempspace and read it there ...

    Being right, does not endow the right to be rude; politeness costs nothing.
    Being unknowing, is not the same as being stupid.
    Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
    Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

Re: garbage collection not happening?
by TedPride (Priest) on Nov 04, 2004 at 13:39 UTC
    Memory is not freed back to the system until the script ends. However, the memory should be reusable, and 20 MB isn't really that much, so I don't see what the problem is? Is memory use increasing with every iteration of your loop, or do you just not like having your script take up large amounts of memory for its entire run?