comment on

Basically, file is split by header records that start with '##', each file is "cut" and placed into an output directory.

Initial testing as I built this script up has a simple ksh to read through the directory and submit another perl script to format, mail & compress. That process handled stress test very well.

Splitter code

sub endEmailPackage {
    my ($SPLITOUT, $splitoutfilename) = @_;
    print $SPLITOUT endLine();
    close $SPLITOUT;
    my $subrc = system("MailPush $splitoutfilename");
    if ($subrc == 0) {
        logit("MailPush $splitoutfilename submitted successfully");
    } else {
        logit("Bad return code on submission of MailPush $splitoutfile
+name, return code is $?");
    }
    sleep 2;
}

sub endLine {
    return '##END' . (' ' x 75) . "\n";
}

sub scrubHeaderParm {
    my ($href) = @_;
    foreach my $k (keys %{$href}) {
        $href->{$k} =~ s/^\s+//;
        $href->{$k} =~ s/\s+$//;
    }
}

    }
}

sub splitupFile {
    my ($INFILE) = @_;
    seek $INFILE, 0, 0;
    my $SPLITOUT;
    my $splitoutfilename;
    while (<$INFILE>) {
        if (/^##A/) {
            my %hopt = /$headerregex/;
            logit($_);
            scrubHeaderParm(\%hopt);
            foreach my $k (keys %hopt) { logit("$k: $hopt{$k}"); }
            endEmailPackage($SPLITOUT, $splitoutfilename) if $addrecto
+t > 0;
            $addrectot++;
            $splitoutfilename = "$prepdir/$hopt{ID}.$hopt{BATCHID}.$$"
+;
            open($SPLITOUT, "> $splitoutfilename") or alert("$!");
            logit("Writing splitup output to $splitoutfilename");
        }
        print $SPLITOUT $_ unless /^##END/;
    }
    endEmailPackage($SPLITOUT, $splitoutfilename) if $addrectot > 0;
}
[download]

Mailer code loop

open(INFILE, "< $infile") or alert("$!");
logit("Opening $infile for reading");
my $datequal = strftime('%m%d%C%y%H%M%S', localtime());
my $ofilename = "$hopt{TPID}.$hopt{BATCHID}.$datequal.txt";
my $ofilepath = "$outdir/$ofilename";
open(my $AFILE, "> $ofilepath") or alert("$!");
logit("Opening $ofilepath for writing");
while (<INFILE>) {
    writeAFileOut($AFILE, $_);
}
close $AFILE;
compressAFile();

if (deliverAPackage()) {
    sleep 2;
    my $rc;
    $rc = system("mv $infile $arcdir");
    logit("Return code of $rc after move of $infile to $arcdir");
    my $bfile = basename $infile;
    $rc = system("/usr/contrib/bin/gzip $arcdir/$bfile");
    logit("Return code of $rc after gzip of $arcdir/$bfile");
    unlink($ofilepath);
}

sub compressAFile {
    logit("Compressing $ofilepath");
    my $gziprc = system("/usr/contrib/bin/gzip -f -n $ofilepath");
    logit("Return code $gziprc after gzip of $ofilepath");
    alert("Unable to compress $ofilepath") if ($gziprc);
    $ofilename = $ofilename . ".gz";
    $ofilepath = "$outdir/$ofilename";
}

sub deliverAPackage {
    my $templatefile = "$templatedir/$hopt{EDITYPE}";
    alert("Failed to load template $templatefile") unless (-e $templat
+efile);
    my $body = `cat $templatefile`;
    $body .= "\n\n";
    $body .= "Effective Date: $hopt{DATE} \n" if ($hopt{DATE} =~ /\S+/
+);
    $body .= "Admin: $hopt{ADMIN}\n";
    $body .= "Email: $hopt{ADMEML}\n\n";
    $body .= `cat $defaulttemplate`;
    $subject = "$subject - $hopt{TPID}";
    my $mailrc = sendEmail($hopt{EMAIL}, $subject, $body, $ofilepath, 
+$hopt{FILENAME}, $hopt{EXT});
    return $mailrc;
}

sub scrubHeaderOpt {
    my ($href) = @_;
    foreach (keys %{$href}) {
        $href->{$_} =~ s/^\s+//;
        $href->{$_} =~ s/\s+$//;
    }
    $href->{EDITYPE} = substr($href->{ID}, 0, 3);
    $href->{EDITYPE} .= $href->{TYP} if $href->{TYP};
    $href->{TPID} = substr($href->{ID}, 3);
}

sub writeAFileOut {
    my ($OFILE, $data) = @_;
    return if ($data =~ /^##ADD/ && $removeaddsw eq 'Y');
    return if ($data =~ /^##END/);
    $data =~ s/\n/\r\n/g;
    print $OFILE $data;
}
[download]

Bottom line is if the system call to MailPush is omitted in endEmailPackage() and instead a ksh simply loops through all the files in a directory, it runs like a charm. Like this, and memory is slurped so hard, the "top" command fails for "not enough memory"...

Also, the existing process had both functions together and for input files over 10M, at some point, system("gzip...") and system("mv...") calls would fail with -1 return code, again memory leak. Problem was alleviated somewhat when I replaced system("rm...") with unlink, but still will pop up intermittently and especially on >100M input files.

In reply to Re: Re: How to find a memory leak - appears to be "system" calls that are responsible by naum
in thread How to find a memory leak - appears to be "system" calls that are responsible by naum

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.