slugger415 has asked for the wisdom of the Perl Monks concerning the following question:
Hello, this is my first post. I'm getting an "out of memory" message. I've looked at some of the previous posts on this subject (2007, 2005 and 2001) but am not sure if I can resolve my problem.
My script "crawls" a large website and builds a list of all pages it can find, via a-href's, using HTML::Treebuilder and a few other modules. The key part is that it saves each URL to a %ListOfURLs hash, which it checks against so it doesn't hit the same page twice.
I'm finding when the hash gets to be more than 27,000, I get the out-of-memory error. Am I just hitting some kind of memory/hash size limit?
There are lots of other hashes and arrays created along the way, such as arrays of all hrefs on each page, e.g.:
my @aList = $tree->find_by_tag_name('a');
I've tried undef'ing those when they're no longer needed but it doesn't seem to make any difference.
I'm happy to provide some code here but it's a pretty busy script. Any suggestions about how I might otherwise build a list to be checked against that would use less memory would be appreciated.
BTW on one post I saw a suggestion to use 'tie', but the documentation for tie speaks thusly:
"This function binds a variable to a package class that will provide the implementation for the variable. VARIABLE is the name of the variable to be enchanted."
To me that might as well say "Tie a shoelace around a shoebox and wave a magic wand over it." :-) I don't understand a word of it.
Thanks for any help you can provide.
Scott
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: Another "out of memory!" problem
by ikegami (Patriarch) on Jun 22, 2010 at 23:18 UTC | |
Re: Another "out of memory!" problem
by BrowserUk (Patriarch) on Jun 23, 2010 at 01:15 UTC | |
Re: Another "out of memory!" problem
by jau (Hermit) on Jun 23, 2010 at 09:29 UTC | |
by slugger415 (Monk) on Jun 24, 2010 at 00:26 UTC | |
Re: Another "out of memory!" problem
by Plankton (Vicar) on Jun 22, 2010 at 23:13 UTC | |
Re: Another "out of memory!" problem
by Marshall (Canon) on Jun 22, 2010 at 23:56 UTC | |
Re: Another "out of memory!" problem
by slugger415 (Monk) on Jun 23, 2010 at 06:24 UTC | |
Re: Another "out of memory!" problem
by Cody Fendant (Hermit) on Jun 23, 2010 at 05:50 UTC |