Just a quick tip/opinion for you:
Try to avoid
WWW::Robot as the basis for your spider. For unknown reasons, it tends to lock up after a few crawls, see
this for more info. Im currently talking to the authors at Canon Labs and will update the latter link with any info I get.
Just my two pennies.
SMiTZ