We had an early 4-cpu (real cpus not cores) SMP box with a (shared) 1Gbps link direct to the (a) backbone. We easily saturated that with 32 threads running bog standard LWP & Digest::MD5--provided we didn't store the data to disk. Even with raided (5 I think) disks, the bottleneck was storing what we could read. That was circa 7 years ago.
To do the job properly at the scale you are talking about, you'd need to run a distributed crawler, each node with dedicated, high-speed, raided local drives--or hugely expensive SSD arrays, and a distributed queueing mechanism.
In reply to Re^15: Async DNS with LWP
by BrowserUk
in thread Async DNS with LWP
by jc
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |