mr.nick has asked for the wisdom of the Perl Monks concerning the following question:
After struggling to reduce the memory usage of a statistics program of mine; after blaming everything from tie to NDBM_File to DBI, I've finally realized something: my program consumes vast quantities of memory without actually storing anything.
What do I mean? Well, look at the following code and tell me: why does it grow to hundreds of megabytes of memory? I'm not storing any data any place.
Do you see anyplace where any data is being stored beyond the one-line-at-a-time level? There are no globals and certainly no variables outside the while (<>) or subroutines.#!/usr/bin/perl use strict; use CGI; use URI::Escape; ###################################################################### +######## # # ###################################################################### +######## if ($ARGV[0]=~/\.gz$/i) { open STDIN,"zcat $ARGV[0] |"; shift @ARGV; } ###################################################################### +######## ## sub breakquery { my $sz=shift; my %res; $sz="\L$sz"; $sz=~s/\.[a-z]{2}\.//g; $sz=~s/\@[a-z]{2}.\d+//g; while ($sz=~s/[\"\']([^\"\']+?)[\"\']//) { $res{$1}++; } $sz=~s/\s{2}/ /g; $sz=~s/[\+\'\"\$\(\)]//g; if ($sz) { my @terms=split /[\s,]/,$sz; for my $t (@terms) { $t=~s/^\s+//; $t=~s/\s+$//; next if $t=~/^\s*$/ || $t=~/^..{0,1}$/ || $t!~/^[a-z0-9\-]+$/ || + $t=~/^[0-9]+$/; $res{$t}++ unless grep /^$t$/,qw( and not or adj of the for with + ); } } sort keys %res; } ###################################################################### +######## while (<>) { next unless m{GET /netacgi/nph-brs\?([^\s]+)}; my $cgi=new CGI($1); next unless defined $cgi; my $db=$cgi->param("d"); next if $db=~/^\s*$/; $db="\U$db"; next if grep /^$db$/,qw( CHNH CHCA ); next if length($db)!=4; my $s4=$cgi->param("s4"); next unless defined $s4; next if $s4=~/^\s*$/; my @terms=breakquery $s4; print STDERR "\r"; printf STDERR "%.100s","$db ".join(" ",@terms); # $db $t }
So why does this program grow in size when run? Like I said, I'm not accumulating data :( And within 26 minutes or so of running it exceeds 200MB of memory. After 50 minutes, it consumes 400MB.
Am I missing something really basic here?
Btw, sample input data looks like:
anx57-105.dialup.emory.edu - - [01/Mar/2001:00:00:21 -0500] "GET /deta +il/detail.html HTTP/1.0" 200 19308 "http://chid.nih.gov/netacgi/nph-b +rs?op4=and&op5=and&op6=and&op7=and&op8=and&op9=and&op10=and&d=CHCP&l= +20&Sect1=CINK&co3=and&pg4=all&s4=underserved&co4=and&pg5=mj&s5=cervic +al+cancer&co5=and&pg6=de&s6=&co6=and&pg7=au,cn&s7=&co7=and&pg8=ti&s8= +&co8=and&pg9=ac&s9=&co9=and&pg10=so,av&s10=&s1=@YR%3E=1995+or+199X.&c +o1=and&s3=&co2=and&s2=&Sect2=IMAGE&Sect3=THESOFF&Sect3=PLUROFF&Sect4= +HITOFF&p=1&u=/detail/detail.html&r=8&f=G" "Mozilla/4.73 [en] (Win95; +U)"
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Memory Leaks
by mr.nick (Chaplain) on Apr 11, 2001 at 23:34 UTC |