First, a site with way more than 27,000 pages is likely to have some tools or a special API that you can use for searching their site. For example, here is one post about accessing PubMed Re: CGI to query other websites. You may not wind up being very popular with the sysadmin if you really "beat the heck of their site" within a short period of time.

Some other strategies would be to use a Google search to get the number of pages narrowed down and then search further on those pages. Its not clear to me what you are doing and why you have to visit every single page on this large site. A more optimized strategy might be possible if you could present some more application info?

A hash with 27,000 keys of URL's doesn't sound large enough by itself to run out of memory, but sounds like there are multiple other large structures. A DB is one possible answer if you really do need to collect this massive amount of information for this site. The Perl DBI is very good and plays very well with MySQL or SQLlite.


In reply to Re: Another "out of memory!" problem by Marshall
in thread Another "out of memory!" problem by slugger415

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.