in reply to Writing a Web Crawler

The module WWW::Robot implements the logic you are trying to recreate. I'd recommend using the module directly, but if you want to roll your own, try looking at that module's source code for ideas.

-Mark