ezekiel has asked for the wisdom of the Perl Monks concerning the following question:

Hello,

I have included the guts of a script I have written that basically counts the frequency of alphabetic characters in a series of strings. It runs fine except, it has a memory leak, gradually using up memory until the machine expires. I have tried undef-ing everything in sight but still the problem occurs. Either I have missed something, in which case I would be grateful if a fresh pair of eyes could point it out, or perhaps there is a leak in the DBI module? Thanks for your help.

open (OUTPUT, ">>./output.txt") or die "Cannot open the output file: $ +!"; foreach my $id (@ids) { my $sth = $dbh->prepare("select long_string from my_table where id +=?"); $sth->execute($id); my $long_string = $sth->fetchrow_array(); $sth->finish(); undef($sth); my %count; my @letters = split("", $long_string); foreach my $letter (@letters) { $count{$letter}++; } undef(@letters); undef($long_string); # Print out the counts for each string print OUTPUT "$id\t" . ($count{A} || 0) . "\t" . ($count{B} || 0) . "\t" . ($count{C} || 0) . "\t" . ($count{D} || 0) . "\t" . ($count{E} || 0) . "\t" . ($count{F} || 0) . "\t" . ($count{G} || 0) . "\t" . ($count{H} || 0) . "\t" . ($count{I} || 0) . "\t" . ($count{J} || 0) . "\t" . ($count{K} || 0) . "\t" . ($count{L} || 0) . "\t" . ($count{M} || 0) . "\t" . ($count{N} || 0) . "\t" . ($count{O} || 0) . "\t" . ($count{P} || 0) . "\t" . ($count{Q} || 0) . "\t" . ($count{R} || 0) . "\t" . ($count{S} || 0) . "\t" . ($count{T} || 0) . "\t" . ($count{U} || 0) . "\t" . ($count{V} || 0) . "\t" . ($count{W} || 0) . "\t" . ($count{X} || 0) . "\t" . ($count{Y} || 0) . "\t" . ($count{Z} || 0) . "\n"; undef(%count); } close (OUTPUT) or die "Cannot close the output file: $!";

Replies are listed 'Best First'.
Re: Where are those memory leaks?
by jZed (Prior) on Mar 24, 2004 at 03:31 UTC
    You have the prepare inside the loop, it should be outside the loop. Prepare once before the loop, then execute many times inside the loop. Don't undef the $sth. If the id is a unique id, i.e. there is only one row to fetch, don't use finish at all (unless your DBD docs tell you to) - it is usually only required when there are more rows to fetch.

    Also, you didn't mention which DBD and RDBMS you are using or what platform you are on. It's unlikely that DBI itself is leaking to the extent you indicate but there may be issues with the DBD or the RDBMS or the platform.

      Thanks for your response.

      I tried with the prepare outside the loop and got an "Invalid cursor state" error, so put it back inside the loop. The id is unique. The finish and undef are only there in a desperate attempt to undef everything in trying to solve the memory leak :-)

      I am using DB2 (version 7.2) on IBM/AIX hardware/operating system with the DBD:DB2 driver

        I am using DB2 (version 7.2) on IBM/AIX hardware/operating system with the DBD:DB2 driver

        You use operating system very loosly there... =)


        -Waswas
Re: Where are those memory leaks?
by jaa (Friar) on Mar 24, 2004 at 09:26 UTC

    Well - memory leaks are are often pesky little beasties to catch!

    I would guess that the most likely candidate is the DBH - but you dont show exactly how you are opening the DBH. Make sure that PrintError and RaiseError are on. If necessary, put your DBH into trace mode.

    Also, you say that it runs fine, but gradually consumes memory until the machine expires - yet your example does not really show any sort of long-running loop? Perhaps the issue is actually in the loop construction?

    Self referencing and circular object references can also cause issues - but OOh - your example does not indicate that sort of style.

    In my experience, using undef to force vars out of scope is usually a mistake. There are some malloc cleanup bugs in Perls built with libc2.2.5 that can cause garbage collection to run for long periods.

    and because I couldn't resist, here is Another Way - there are no doubt even More Ways to remove intermediate vars. :)

    #!/usr/local/bin/perl -w use strict; use DBI; my @ids = @ARGV; die "no ids specified" unless scalar @ids; open (OUTPUT, ">>./output.txt") or die "Cannot open the output file: $ +!"; my $dbh; my $sql = "select long_string from mytable where id=?"; { my ( $dsn, $user, $password ) = ( "mydsn", "myuser", "mypassword", ); $dbh = DBI->connect( $dsn, $user, $password, {RaiseError => 1, AutoCommit => 0, PrintError => 1} ); die "DB Connect failed: DSN='$dsn' User='$user' Error='$DBI::errstr' +" unless $dbh; } my $sth = $dbh->prepare($sql); my %count; my @zeros = (0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0); foreach my $id (@ids) { my $long_string = uc($dbh->selectrow_array($sth,{},$id)); @count{'A'..'Z'} = @zeros; for my $i ( 0 .. length($long_string)-1 ) { $count{substr($long_string,$i,1)}++; } print OUTPUT join("\t", ($id, @count{'A'..'Z'})), "\n"; } close (OUTPUT) or die "Cannot close the output file: $!"; $dbh->disconnect();

    Oh - and did I mention perl -w and strict? Essentials! As an aside, your example only counts uppercase letters, and no other chars - I guess that you are counting on the quality of the input data.

    Regards

    Jeff

Re: Where are those memory leaks?
by kvale (Monsignor) on Mar 24, 2004 at 04:48 UTC
    The statement
    my $long_string = $sth->fetchrow_array();
    looks odd. $sth->fetchrow_array() returns an array, but you are evaulating that in scalar context, so $long_string contains the length of the array, which is always 1 for your select statement. A simpler example:
    % perl -e'@a=(2); my ($b) = @a; print "$b\n";' 2 % perl -e'@a=(2); my $b = @a; print "$b\n";' 1

    -Mark

      fetchrow_array called in scalar context returns either the first or the last element of the array depending on the DBD. In this case, since it is a SELECT COUNT, there is only one element so it's both the first and the last.
Re: Where are those memory leaks?
by DrHyde (Prior) on Mar 25, 2004 at 10:05 UTC
Re: Where are those memory leaks?
by periapt (Hermit) on Mar 25, 2004 at 13:17 UTC
    You know, I've always wondered about the loop declaration
    foreach my $letter (@letters){ $count{$letter}++; }
    I know that the scope of $letter is only in the foreach block but I've always wondered if $letter is completely returned to the program heap when it goes out of scope. I know it doesn't return all the way back to the OS but what about the program memory area? I'm not all that clear on perl's memory management internals but it has always seemed to me that there is the potential for heap memory fragmentation with the foreach my ... construct. Could the repeated allocation of the large array @letters followed by a much smaller allocation of $letters followed by some undef'ing and then reallocation be leaving some small bits of unusable memory in the heap? I've never actually crashed a program with this construct but I have always wondered (????) Is there a module to track this sort of thing?