in reply to Benign Web Miner

If you don't mind a non-Perl solution, you might consider Heritrix. It is capable of large scale crawling, is kind to hosts that it visits (if a host takes n seconds to respond, it will wait m*n seconds before hitting that host again; m is configurable but defaults to 5) and has extremely flexible crawl settings.