Strange memory leak question. Please help!

catsophie has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Strange memory leak question. Please help! by Fletch (Bishop) on Sep 21, 2007 at 16:02 UTC
Without the code, you're pretty much going to get generalities. Having said that though, the most likely one that comes to mind given the context is that you might be using HTML::TreeBuilder to do the analysis. It uses circular references which won't get correctly garbage collected unless you call the `delete` method on the instance.	[reply] [d/l]
Re: Strange memory leak question. Please help! by moritz (Cardinal) on Sep 21, 2007 at 16:11 UTC
There could be circular references that the garbage collector can't reclaim. Maybe Devel::Cycle can help you finding them. Perl 6 in German	[reply]
Re: Strange memory leak question. Please help! by artist (Parson) on Sep 21, 2007 at 17:09 UTC
You can also use Test::Memory::Cycle which is built on top of Devel::Cycle --Artist	[reply] [d/l]
Re: Strange memory leak question. Please help! by graff (Chancellor) on Sep 22, 2007 at 01:04 UTC
I ran into a similar, seemingly unavoidable problem with memory consumption when I was facing a huge number of Excel files, and decided to use Spreadsheet::ParseExcel to normalize/condense/combine the data from all of them. For each new Excel file that I opened, read, processed and closed, the module just kept taking up more memory, instead of re-using the space that was allocated for a previous file. I decided to do a work-around, whereby I would process files until some reliable event occurred (e.g. changing directory, because there were never too many files in a single folder), write a "checkpoint" file to indicate how far I had gotten in the overall list, and exit. On start-up, the script would read the checkpoint file to figure out which directory to do next. Then it was just a matter of putting the script in a shell loop, running it enough times to cover the whole set. In your case: Does the database provide info that you need in order to decide which web pages to get? If not, segregate the LWP/HTML::Parser part from the MySQL part -- those two parts don't need to be in the same script. The page-fetch script could just output a tab-delimited text file, which could be loaded to the database via LOAD DATA INFILE. If the page fetch does depend on stuff being fetched from the database, you should still separate the LWP and html parsing to a separate process that just does one page at a time, and run this as a child of the MySQL process at each iteration. In this case, a script that takes a url as a command-line arg, and prints string data suitable for mysql insertion to its STDOUT, could be run via back-ticks or via `open( PROC, "-\|", $script_name, $url );` Either way, most of your trouble comes from trying to do too much in one huge monolithic script. Break it down into simpler components -- that's likely to improve performance in a lot of ways, and will make it easier to maintain; it's a win-win approach.	[reply] [d/l]
Busted: Strange memory leak question. Please help! by catsophie (Initiate) on Sep 22, 2007 at 05:19 UTC
Thank all for quick helps. Bellow is my report on the question. talexb, I suspect my Perl program consumed my memory by using 'free -m' to look at the free memory. When I ran the Perl program, free memory decreased very fast and did not release after the Perl program stopped. Fletch, you got the point. I forgot to delete the tree. Since I called HTML::TreeBuilder many times, that caused a serious memory wastage. After I deleted the tree, the memory leaking was almost solved. When I say 'almost', I mean there is still very slow memory leaking, like 1M bytes several minutes. graff is right, the trouble comes from my large script (1305 lines :P). I should break the script into smaller components. I didn't try Devel::Cycle and Test::Memory::Cycle, since I did not have complex reference structures.	[reply]
Re: Busted: Strange memory leak question. Please help! by sfink (Deacon) on Sep 22, 2007 at 16:10 UTC
That is not a good way to detect a memory leak. You are looking at the total free memory on the system, which could go up or down due to pretty much anything happening on that box. It only worked for you because the leak was so large. Far better would be to find the pid of your process and run `ps l` on it periodically. Look at the VSZ column. If it never changes, then you don't have a leak.	[reply]
Re^2: Busted: Strange memory leak question. Please help! by catsophie (Initiate) on Sep 24, 2007 at 05:27 UTC
Re: Strange memory leak question. Please help! by talexb (Chancellor) on Sep 21, 2007 at 17:20 UTC
What evidence do you have to support the hypothesis that your Perl program is causing the memory leak? Is your program running as a daemon? Can you disable or mockup portions of your program and see if the memory leak persists or goes away? I almost never worry about `undef`ing variables to free them up -- I just allow them to fall out of scope, and Perl does the rest. Alex / talexb / Toronto "Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds	[reply] [d/l]
Re: Strange memory leak question. Please help! by perlfan (Parson) on Sep 21, 2007 at 19:56 UTC
The only time that I was bitten by a memory leak in Perl was when constructing a recursive function out of a an anonymous sub reference, which is/was addressed by Sub::Recursive. Every other time I ran into something like this, it was a mistake or code I wrote that was worse than usual. The best thing to do is to create the smallest, simplest snippet of code that demonstrates the leak. This would serve as your "evidence".	[reply]