Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re^2: Question: Fast way to validate 600K websites

by lihao (Monk)
on May 12, 2008 at 19:35 UTC ( [id://686133]=note: print w/replies, xml ) Need Help??


in reply to Re: Question: Fast way to validate 600K websites
in thread Question: Fast way to validate 600K websites

Hi, huys:

Thank you all for the helpful suggestions:-) I am actually trying to check if 600K listed domain names are reachable. many of them are just garbages like 0.00, hotmailll.com. so I need to discard them(like 000.0.com) or correct them(i.e. from 'hotmaillll.com' to 'hotmail.com'). Right now I have not yet consider sites which disable 'HEAD' method. at this stage, I will just filter out those 'NOT valid' sites into a list and then do more search on that smaller list. :)Most of the information I got so far from this thread is very helpful, thanks again: )

lihao

  • Comment on Re^2: Question: Fast way to validate 600K websites

Replies are listed 'Best First'.
Re^3: Question: Fast way to validate 600K websites
by leocharre (Priest) on May 12, 2008 at 20:38 UTC
    If you want to know if the uri is actually reachable, would simple posix 'ping' help you?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://686133]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (10)
As of 2024-04-19 08:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found