Re: Another "out of memory!" problem

First, a site with way more than 27,000 pages is likely to have some tools or a special API that you can use for searching their site. For example, here is one post about accessing PubMed Re: CGI to query other websites. You may not wind up being very popular with the sysadmin if you really "beat the heck of their site" within a short period of time.

Some other strategies would be to use a Google search to get the number of pages narrowed down and then search further on those pages. Its not clear to me what you are doing and why you have to visit every single page on this large site. A more optimized strategy might be possible if you could present some more application info?

A hash with 27,000 keys of URL's doesn't sound large enough by itself to run out of memory, but sounds like there are multiple other large structures. A DB is one possible answer if you really do need to collect this massive amount of information for this site. The Perl DBI is very good and plays very well with MySQL or SQLlite.

Comment on Re: Another "out of memory!" problem