Re: Re: How to find a memory leak - appears to be "system" calls that are responsible

Basically, file is split by header records that start with '##', each file is "cut" and placed into an output directory.

Initial testing as I built this script up has a simple ksh to read through the directory and submit another perl script to format, mail & compress. That process handled stress test very well.

Splitter code

sub endEmailPackage {
    my ($SPLITOUT, $splitoutfilename) = @_;
    print $SPLITOUT endLine();
    close $SPLITOUT;
    my $subrc = system("MailPush $splitoutfilename");
    if ($subrc == 0) {
        logit("MailPush $splitoutfilename submitted successfully");
    } else {
        logit("Bad return code on submission of MailPush $splitoutfile
+name, return code is $?");
    }
    sleep 2;
}

sub endLine {
    return '##END' . (' ' x 75) . "\n";
}

sub scrubHeaderParm {
    my ($href) = @_;
    foreach my $k (keys %{$href}) {
        $href->{$k} =~ s/^\s+//;
        $href->{$k} =~ s/\s+$//;
    }
}

    }
}

sub splitupFile {
    my ($INFILE) = @_;
    seek $INFILE, 0, 0;
    my $SPLITOUT;
    my $splitoutfilename;
    while (<$INFILE>) {
        if (/^##A/) {
            my %hopt = /$headerregex/;
            logit($_);
            scrubHeaderParm(\%hopt);
            foreach my $k (keys %hopt) { logit("$k: $hopt{$k}"); }
            endEmailPackage($SPLITOUT, $splitoutfilename) if $addrecto
+t > 0;
            $addrectot++;
            $splitoutfilename = "$prepdir/$hopt{ID}.$hopt{BATCHID}.$$"
+;
            open($SPLITOUT, "> $splitoutfilename") or alert("$!");
            logit("Writing splitup output to $splitoutfilename");
        }
        print $SPLITOUT $_ unless /^##END/;
    }
    endEmailPackage($SPLITOUT, $splitoutfilename) if $addrectot > 0;
}
[download]

Mailer code loop

open(INFILE, "< $infile") or alert("$!");
logit("Opening $infile for reading");
my $datequal = strftime('%m%d%C%y%H%M%S', localtime());
my $ofilename = "$hopt{TPID}.$hopt{BATCHID}.$datequal.txt";
my $ofilepath = "$outdir/$ofilename";
open(my $AFILE, "> $ofilepath") or alert("$!");
logit("Opening $ofilepath for writing");
while (<INFILE>) {
    writeAFileOut($AFILE, $_);
}
close $AFILE;
compressAFile();

if (deliverAPackage()) {
    sleep 2;
    my $rc;
    $rc = system("mv $infile $arcdir");
    logit("Return code of $rc after move of $infile to $arcdir");
    my $bfile = basename $infile;
    $rc = system("/usr/contrib/bin/gzip $arcdir/$bfile");
    logit("Return code of $rc after gzip of $arcdir/$bfile");
    unlink($ofilepath);
}

sub compressAFile {
    logit("Compressing $ofilepath");
    my $gziprc = system("/usr/contrib/bin/gzip -f -n $ofilepath");
    logit("Return code $gziprc after gzip of $ofilepath");
    alert("Unable to compress $ofilepath") if ($gziprc);
    $ofilename = $ofilename . ".gz";
    $ofilepath = "$outdir/$ofilename";
}

sub deliverAPackage {
    my $templatefile = "$templatedir/$hopt{EDITYPE}";
    alert("Failed to load template $templatefile") unless (-e $templat
+efile);
    my $body = `cat $templatefile`;
    $body .= "\n\n";
    $body .= "Effective Date: $hopt{DATE} \n" if ($hopt{DATE} =~ /\S+/
+);
    $body .= "Admin: $hopt{ADMIN}\n";
    $body .= "Email: $hopt{ADMEML}\n\n";
    $body .= `cat $defaulttemplate`;
    $subject = "$subject - $hopt{TPID}";
    my $mailrc = sendEmail($hopt{EMAIL}, $subject, $body, $ofilepath, 
+$hopt{FILENAME}, $hopt{EXT});
    return $mailrc;
}

sub scrubHeaderOpt {
    my ($href) = @_;
    foreach (keys %{$href}) {
        $href->{$_} =~ s/^\s+//;
        $href->{$_} =~ s/\s+$//;
    }
    $href->{EDITYPE} = substr($href->{ID}, 0, 3);
    $href->{EDITYPE} .= $href->{TYP} if $href->{TYP};
    $href->{TPID} = substr($href->{ID}, 3);
}

sub writeAFileOut {
    my ($OFILE, $data) = @_;
    return if ($data =~ /^##ADD/ && $removeaddsw eq 'Y');
    return if ($data =~ /^##END/);
    $data =~ s/\n/\r\n/g;
    print $OFILE $data;
}
[download]

Bottom line is if the system call to MailPush is omitted in endEmailPackage() and instead a ksh simply loops through all the files in a directory, it runs like a charm. Like this, and memory is slurped so hard, the "top" command fails for "not enough memory"...

Also, the existing process had both functions together and for input files over 10M, at some point, system("gzip...") and system("mv...") calls would fail with -1 return code, again memory leak. Problem was alleviated somewhat when I replaced system("rm...") with unlink, but still will pop up intermittently and especially on >100M input files.

Comment on Re: Re: How to find a memory leak - appears to be "system" calls that are responsible Select or Download Code

Replies are listed 'Best First'.
Re: Re: Re: How to find a memory leak - appears to be "system" calls that are responsible by TilRMan (Friar) on May 15, 2004 at 07:37 UTC
... if the system call to MailPush is omitted ... it runs like a charm. Sounds like a problem with MailPush, then, not with gzip or mv. My guess is that MailPush is trying to be helpful by daemonizing itself to send the mail and returning immediately. Then it up and loads the whole file into memory before mailing it. Try replacing MailPush with `cat $splitoutfilename > /dev/null` or something similar. If that works, try replacing it with Mail::Mailer or something similar. If that works, you're done! If it doesn't, use top to investigate while parsing a smaller file, one that doesn't completely hose the system.	[reply] [d/l]
Re: Re: Re: Re: How to find a memory leak - appears to be "system" calls that are responsible by naum (Initiate) on May 15, 2004 at 15:09 UTC
>>Sounds like a problem with MailPush, then, not with gzip or mv. My guess is that MailPush is trying to be helpful by daemonizing itself to send the mail and returning immediately. Then it up and loads the whole file into memory before mailing it. No because I can string all those calls to MailPush in a ksh for loop and memory usage never goes above 8M whereas the Splitter just leaks memory out the wazoo when using "system" to invoke the process...	[reply]
Re: Re: Re: Re: Re: How to find a memory leak - appears to be "system" calls that are responsible by TilRMan (Friar) on May 15, 2004 at 16:11 UTC
... I can string all those calls to MailPush in a ksh for loop and memory usage never goes above 8M ... Well, then do that! I'm sorry, but I at least need more information. top! See how many processes are running at a time, and how much memory they each use. Since you are using system() everywhere, you should never have more than two processes: perl and whatever's in the `system()` (plus the extra shell that `system()` gives you). top combined with `perl -d` should let you find the exact point where the system bogs down. How about some of this `logit()` output? If logit() is timestamped, it could be useful. Include `$!` in your log messages after a `system()`; this will explain the `-1`s. Did the original script suffer from this problem? What are these files' names and how do they get called? Which is the library and which are the scripts?	[reply] [d/l] [select]