in reply to Question: Fast way to validate 600K websites

I've encountered dynamic websites that only implement GET (or rather, they neglect to implement HEAD). Also, what do you mean by "validated"? Someone's listening on port 80? The server returns a 2xx return code? The server is running HTTP 1.1?

No matter what technique you adopt, you'll find most of your time is spent waiting for the network socket to be established. You will gain a lot by setting up a farm of workers to handle connections in parallel. Parallel::ForkManager is one way, but you'll probably get more mileage from LWP::Parallel.

Oh, and I don't understand your question about 0.00. Can you post a snippet demonstrating the problem?

update: I just thought of another thing, this reminds of something I once wrote. The first step is to see whether the host itself is still around. Extract the host name from the URI and see if you can resolve its address (you shall have to do this anyway, with a bit of luck you'll warm up your DNS cache regardless). If you can't even resolve the address to an A or CNAME record, there's no point even trying to fetch the page.

Normally you'll get back a negative response from a DNS server much faster than a putative web server that's just not there any more.

• another intruder with the mooring in the heart of the Perl

  • Comment on Re: Question: Fast way to validate 600K websites