in reply to Testing a web crawler

I submitted an article to the Perl Journal (or was it the Perl Review...?) a while back about using Mojolicious to do this. Never did hear anything more. Hm. Well.

It boils down to it being really, really easy to make Mojo respond any way you want to a URL, so you can have your spider "visit" the Mojo URL, get a page with a bunch of links in it, and then test all the different kinds of things that could happen (timeout, 404, 500, you name it) by sending appropriately-crafted URLs to the Mojo server - which all happen to be on the first page you crawl. You need one "that's all folks" URL to make the Mojolicious server go away, but that's easy enough to do.

Replies are listed 'Best First'.
Re^2: Testing a web crawler
by dlarochelle (Sexton) on Mar 25, 2010 at 22:32 UTC

    pemungkah,

    Have you thought about turning your article into a blog post? I looked at Mojolicious but it seemed complicated so a guide to doing something like this would be useful. It would be a shame if the community wasn't able to benefit from the work you did writing up the article.