Dear monks,
I study bioinformatics, and presently I wrote a parser to study the genome of T. Cruzi. What this simply does, is to open a 1GB file, and go line by line through all the "words" of the file and write down their location and number of insertion in a mysql database.

The problem is that while the program runs, the ram keeps going up, when it shouldnt. Since I expect this program to run during 7 days, before obtaining my results, what should I do? Below follows the code where the RAM keeps pumping up.

Thanks
Paulo Carvalho
LONG:while (<READ>) { $jump ++; if ($jump <= $rows2jump){ $counter2++; next LONG; } $counter2++; #number of lines processed #Lets deal with the sequences head here if ($_ =~ /^>/) { $h_title = "$_"; $h_title =~ s/^>//; #Check if head exists #If head doesnt exist, insert it in the <br>head index + table $sth1->execute($h_title) or <br>print "$DBI::errstr"; #Get last id $h_title_counter = $dbh->selectrow_array("SELECT MAX(i +d) FROM head_index"); next LONG; } #Here we will deal with the actal sequence chop; #To remove the > at the end of the line my $line = "$_"; #Input stuff in mysql for my $x (0 ... (length($line) - ($word_length)-1)) { my $word = substr($line, $x, $word_length); <br> #Now we have it with a big cache, hope to speed things + up if ($most_used_words{$word}) { #UPDATE e_match_word set counter = counter+1 WHERE + word = ? $sth3->execute($word) or print "$DBI::errstr"; #print "cached result\n"; } elsif ($dbh->selectrow_array("SELECT 1 FROM e_match_ +word WHERE word = ?", undef, $word)) { #UPDATE e_match_word set counter = counter+1 WHERE + word = ? $sth3->execute($word) or print "$DBI::errstr"; #print "not cached result\n"; } else { #INSERT INTO e_match_word VALUES (?,?) $sth2->execute($word, 1) or print "$DBI::errstr"; $wordcounter++; #print "new word\n"; } #INSERT INTO e_match_info values (?,?,?) $sth4->execute($word, $h_title_counter, $x) or print " +$DBI::errstr"; } #$percent_done = int(($counter2/$counter) * 100); $status = "Searching for exact patterns.. processing line +$counter2 of $counter"; $mw->update(); $mw->update; exit if ($stop); } close (READ); }

In reply to Memory Overflow by cav

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.