Hello, this is my first post. I'm getting an "out of memory" message. I've looked at some of the previous posts on this subject (2007, 2005 and 2001) but am not sure if I can resolve my problem.

My script "crawls" a large website and builds a list of all pages it can find, via a-href's, using HTML::Treebuilder and a few other modules. The key part is that it saves each URL to a %ListOfURLs hash, which it checks against so it doesn't hit the same page twice.

I'm finding when the hash gets to be more than 27,000, I get the out-of-memory error. Am I just hitting some kind of memory/hash size limit?

There are lots of other hashes and arrays created along the way, such as arrays of all hrefs on each page, e.g.:

my @aList = $tree->find_by_tag_name('a');

I've tried undef'ing those when they're no longer needed but it doesn't seem to make any difference.

I'm happy to provide some code here but it's a pretty busy script. Any suggestions about how I might otherwise build a list to be checked against that would use less memory would be appreciated.

BTW on one post I saw a suggestion to use 'tie', but the documentation for tie speaks thusly:

"This function binds a variable to a package class that will provide the implementation for the variable. VARIABLE is the name of the variable to be enchanted."

To me that might as well say "Tie a shoelace around a shoebox and wave a magic wand over it." :-) I don't understand a word of it.

Thanks for any help you can provide.

Scott


In reply to Another "out of memory!" problem by slugger415

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.