in reply to Re^15: Async DNS with LWP
in thread Async DNS with LWP
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^17: Async DNS with LWP
by BrowserUk (Patriarch) on Oct 09, 2010 at 13:32 UTC | |
A 1 Gbps connection was saturated by downloading only 32 web pages at a time? Instantaneously yes. And for a substantial proportion of the time assuming the servers we were connected to, and their connections, were able supply their data at the required rate. Obviously if the mix of servers at any given point in time were all 386 class machines in people's back bedrooms, connected via 14.4k modems--not so uncommon back then--then throughput falls off. But usually you had a random mix of good and bad severs and it was sufficient to max the bandwidth available. Remember I mentioned the 1Gbps was shared. If I remember correctly, by 20 other hosts. Mostly they seemed to be using very little of the bandwidth. Probably low-volume websites running "Mum&Pop's Potpurri Emporium Inc." or "HaKzOr23's CRucIal SeCurITy SiGht". We weren't party to what they were, or what bandwidth they were using, but the hosters ControlPanel app showed us our usage, for which we were billed. By way of a convincer. The following two trivial scripts run as (2)servers and (2)clients on my 4 cpu machine. I set the affinities so that 2 cores are running the two server threads; and 2 the two client threads. All they do is connect to each other and shovel large lumps of date through from server to client as fast as they can: Server:
Clients:
That data doesn't go via the internet (my broadband connection is 300kbps at best); but it does go via the tcp stack and is therefore subject to all the handshaking, coalescing and buffering that a proper ip connection goes through. The main thread in the cients script monitors the throughput on a per second basis. Here's a typical snapshot of that:
The cpu usages whilst all that data is flying about is about 12% each for the servers, and 5% each for the clients. The throughput varies up and down a bit between say 50MBytes/s and 58MBytes/s, but 53/54 is the norm. Remember, for 32 threads to sustain a combined throughput of 1Gbps (100MBytes/s), each thread has only to achieve 3MBytes/s. Obviously overall throughput at any given point will depend upon the mix of large and small files; good and bad servers; general network load; no of hops; and myriad other factors. But throwing large numbers of more threads at the problem has rapidly diminishing returns. 4 threads per CPU seemed optimal on that system at that time. 8 per cpu sometimes improved overall throughput, but that was mostly negated by the effects of thrashing the disks harder by writing to twice as many files concurrently. That's why I say that you have to consider the complete system. And also why async DNS doesn't make much difference. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |
by jc (Acolyte) on Oct 09, 2010 at 17:24 UTC | |
I see what you mean. In fact, I was toying with the idea of the following architecture to maximise throughput: * Setup asycnronous DNS to quickly resolve all of the 90,000,000 domains to find out which domains reside at the same IP (shared hosting domains). * Send out a TCP ack to each web server (asynchronously) to get a shortlist of which domain names actually have a web server which responds (should send a RST in response to unexpected ACK). * Then send out TCP connects with a short timeout to short list servers which respond faster. * To those that respond fast enough send out HEAD requests to obtain document sizes. * Asynchronously GET the smallest documents first such that database of links experiences the fastest possible growth. Any thoughts? | [reply] |
by BrowserUk (Patriarch) on Oct 09, 2010 at 20:06 UTC | |
Any thoughts? Yes. You are fixated on asynchronous DNS. If you sent out concurrent DNS requests for 90 million urls, your DNS server/provider would almost certainly blacklist you instantly. Synchronous DNS is never a factor in throughput after the first second of runtime. Because any time one thread spends waiting for DNS, one or more others will be utilising the processor and bandwidth downloading. You're approaching the whole problem the wrong way. You're trying to optimise things before you actually have any idea of where the bottlenecks are. | [reply] |
by jc (Acolyte) on Oct 09, 2010 at 21:13 UTC | |
by BrowserUk (Patriarch) on Oct 10, 2010 at 08:50 UTC | |
by roboticus (Chancellor) on Oct 10, 2010 at 17:07 UTC | |
jc: To amplify BrowserUk's point: Too many people want to create an "optimal" solution right out of the gate. But computer and network behaviour is so complicated that without information, you can't determine *what* to optimize, nor which behaviour you may need to fix. Remember: So first just try coding the simplest thing you can that works. After you've made it work correctly, is it fast/good enough? If so you're done--and with far less work! Only if it's not fast/good enough do you need to make any improvements. So, to improve it, first figure out what needs improvement: If you just guess, you're likely to be wrong, and you'll waste your time. Look at your measurement results to see where you can get the most improvement, make the improvement, and check whether you're done. If not, take more measurements, choose the next chunk of code, etc. How do you know when you're done? If at all possible, choose a performance goal. Once you meet it, you're done. Sometimes you'll find that you must accept worse performance than you planned (if you can't improve the performance enough), or you'll have to investigate better algorithms, faster hard drives, more memory, etc. ...roboticus | [reply] |
by jc (Acolyte) on Oct 10, 2010 at 20:14 UTC | |