in reply to Memory Growth Problem
So running out of memory by doing n-gram counts on protein sequences would mean that you are doing lots of sequences, and a newly created hash of ngram counts is somehow being retained after each one. Since I have had occasion to use Text::Ngram, I wanted to check this carefully.
Please let me know if the following test script somehow falls short in terms of representing your particular usage, because as it stands, it does not replicate the memory leak:
(updated to include fixed-width numeric fields in the printf)#!/usr/bin/perl use strict; use warnings; use Text::Ngram qw/ngram_counts/; $|++; my @p = qw/a c g t/; my $test_seq = join( '', map { $p[rand @p] } 0..2047 ); my $counter = 0; while ( 1 ) { my $href = ngram_counts( $test_seq ); my $ngrams = scalar keys %$href; if ( ++$counter % 100 == 0 ) { printf( "found %4d 5-grams on iteration # %8d\r", $ngrams, $co +unter ); $test_seq = join( '', map { $p[rand @p] } 0..2047 ); } }
No matter how long I let that run, it stays at a constant memory footprint, suggesting that Text::Ngram by itself does not have a memory leak. (I let it go over 200K iterations, which ought to be equivalent to processing about 400 MB of data.)
You didn't indicate what your code looks like after you stopped using that module, but I'm wondering if there might have been some other factor at play in creating (and then fixing) the memory leak.
I notice that the current version of Text::Ngram seems to date from June 2006, so you probably have that version. If you run my test script and it blows up on your machine, then there's probably something wrong with your particular installation of Text::Ngram. (I just did a fresh install on macosx with perl 5.8.8.)
FWIW, I tried a variant of my test script, declaring an array outside the while loop and pushing the href onto the array at each iteration. The process grew to 1 GB of memory before it got to 36 K iterations. (Update: the version as posted used a constant 19 MB of RAM, about the same size as a login bash shell on my mac.)
|
|---|