A Search Crawler?

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi All

I wish to build a script that looks for a URL address on a particular website. These websites can be random, and the URL address being searched for can be random as well. So my solution must be building a web robot to crawl each site, link after link until I can find that particular URL address. If it does not find it, then it will report back saying URL address not found on this site. How would I start? Should I build a robot from scratch , or are there any existing modules that I can use?

Or another way is perhaps relying on Google and using the search tag "site:address.com url-address.com", and then parsing the results.

Also If I was to go w/ the first idea and use the robot feature, would this be too much of a resource user? For example, Google's Crawl only takes under a minute to crawl a site with 10-15 pages. Will I be able to make one as efficient as theres?

Any other possible solutions?

Thanks,
Jasper

Comment on A Search Crawler?

Replies are listed 'Best First'.
Re: A Search Crawler? by stonecolddevin (Parson) on Feb 23, 2007 at 19:09 UTC
I don't know about how fast it is, but WWW::Robot seems like the logical choice in this situation. When using Google, you won't always have the websites indexed. Hope this helps! meh.	[reply]