Accidental Hack has asked for the wisdom of the Perl Monks concerning the following question:

Hello. I'm using Archive::Zip to read Zips of large web server logs. If I read one alone, the script runs fine. When I do more, say point it at a directory full of the year's worth, it runs through the first Zip fine, but then the CPU spikes to 100% and the RAM usage trends slowly upward without stopping. The only thing I can determine is happening is that the objects from the first Zip and/or text file are not being disposed of properly. Can anyone point me to an article on this or set me straight? Thanks in advance. Here's the basics of the code I'm using:
use Archive::Zip; use Archive::Zip::MemberRead; ...set up my variables and readdir to get list of Zips... foreach $file (@files) { my $zip = new Archive::Zip("$source/$file"); my $member = $zip->memberNamed('WebTrends.log'); my $fh = $member->readFileHandle(); while (defined($line = $fh->getline())) { ...process each line and write some values to a file... } $fh->close(); }

Replies are listed 'Best First'.
Re: Archive-Zip memory problems
by Accidental Hack (Acolyte) on Dec 20, 2004 at 20:15 UTC
    Wanted to thank everyone for responding, first of all. I've made a few changes to the test code based on the suggestions and it looks like the problems have *mostly* cleared. What I'm seeing now is a bottleneck more than anything. One Zip is processed (each Zip contains one server log file), the script slows to a crawl for about 1-3 minutes when the one Zip is done and the next begins, and then it continues whizzing through the text file (I'm roughly gauging the speed by having a line counter updating). I'm still a little puzzled by the bottleneck. My best guess is some sort of buffer size issue. Thanks again!
    use Archive::Zip; use Archive::Zip::MemberRead; ...a few other variables set up here... @files = qw(ex041201.zip ex041202.zip ex041203.zip ex041204.zip); foreach $file (@files) { open(LOG, ">>$log_terms") || die "Cannot open $log_terms. $!"; $zip = new Archive::Zip("$source/$file"); $text = new Archive::Zip::MemberRead($zip, "WebTrends.log"); while (defined($line = $text->getline())) { @splitline = split / /, $line; ...do some processing... undef(@splitline); } } undef($text); undef($zip); close(LOG); } print "\a"; }
      If you want to find the bottleneck then profile the program:
      perl -d:DProf myscript.pl dprofpp
Re: Archive-Zip memory problems
by Jaap (Curate) on Dec 20, 2004 at 16:57 UTC
    You don't show how you open the 2nd zipfile. Do you use the same $zip object or do you re-initialise it?
      Thanks for your time. I just updated the code. I readdir and loop through the list of Zip files.
Re: Archive-Zip memory problems
by KeighleHawk (Scribe) on Dec 20, 2004 at 19:21 UTC
    I'm not an expert on the arcana of Perl variable scoping, but from a purely logicial point of view, I would say since you declared $zip in the loop with a my, it is creating a new $zip on every itereation. Therefore, you may want to either undef $zip, or move the my $zip outside the loop and then just do the $zip = new ... (without the my) inside the loop.

    Either way, it may be a good idea to undef $zip or even call the DESTROY method explicitly to insure it has been discarded...

    P.S. You should do the same thing with all the other values you declared in side the loop...

    P.P.S. Thinking about it, I would move all the my's outside the loop. Seems like wasted time to redeclare them on every loop. Might slow down the processing if you are going through enough files.

Re: Archive-Zip memory problems
by redhotpenguin (Deacon) on Dec 20, 2004 at 18:55 UTC
    Have you verified it's actually processing more than one file? You could step through the code with the perl debugger to determine if and where a memory leak is occuring.

    Or you could simplify your program and wrap it in a find statement as a work-around to process individual files

    find . -name '*.zip' -exec perl archive.pl {} \;
Re: Archive-Zip memory problems
by noslenj123 (Scribe) on Dec 20, 2004 at 19:20 UTC
    I do something very similar. I remember early on having a problem that might be the same as yours. Anyway, what I ended up doing is adding: "undef $fh;" at the bottom of the loop to make sure the object was completely cleaned up before moving on to the next one.