in reply to Parallel WEB spider, DB support
Threaded solutions are possible, easy and very scalable.
Niceties like not hitting the same server to frequently; only downloading a given url once; not retrying failing servers many times over; are all made easier by shared state.
|
|---|