Are you intending to move this to a beefier box at some point in the future? If so, what spec of box and what bandwidth will it have?
If not, what bandwidth do you have on the current box?
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] |
Yes! Definitely! Ideally I would want as fat a bandwidth as possible. Ideally with lots and lots of memory. I'm negotiating with my University but that is unlikely to get anywhere fast. It looks like I'm going to have to fork out for a dedicated server myself. Seeing that I am concentrating on .com domains at the moment geographical location and connection factors of the box could be important. I was thinking something like http://www.m5hosting.com/ValueNet.php These guys also seem to be the only ones I can find with decent OpenBSD dedicated servers. With 1,500 GB/month transfer and a maximum burstable speed of 100Mbps with no additional charge it looks like as good a deal as I'm likely to find. According to http://www.m5hosting.com/network.php they seem to be pretty well connected as well. Especially for US based traffic. I've also used these guys before and their service was pretty good. They fix problems fast and don't charge for the service. However, as it stands I'm connected to the internet through a USB modem via a mobile phones operator that offers variable bandwidths (depending on where you are) and is limited to only 100 hours per month. In any case, the amount of time I crawl the net in is always going to ultimately limited by bandwidth and how best I use it. Asynchronous DNS and HTTP seem to be fundamental issues. I'm almost at the point where I'm thinking. Why didn't I just code this in C from the very beginning? I once made (years ago and I no longer have the code) an asynchronous DNS resolver in C and, yes, it did take much longer to implement but as far as I can remember it was as fast as the 100Mbps burstable speed could take. In fact, it ran so well that we had to redirect it to a better DNS server (a cluster of load balanced DNS servers) so that they could keep up with the requests. I've even thought that maybe it is a good idea to write my own recursive resolver to see if there are ways this process can be optimised. Maybe this is overkill but I want this to work in hours (maybe in days) but certainly not in years.
| [reply] |
I'm sorry, but worrying about async DNS at this point is ... well, pointless. Let's do some math.
With your current setup, 90e6 sites with say an average of 100k per home page(*). To download that lot in your 100 hour allocation, you'd need to be fetching constantly at a rate of 25Mbytes/s. That would would (conservatively) require a 250Mbps connection. To do it in your target 3 hours you'd need an 8 Gbps connection.
Now, I'm not sure what data rates you can achieve with GSM (GPRS,EGPRS) in the US, but I'm pretty sure they'll be measured in 10s of Kbps. Not Mbps much less Gbps.
Even once you moved to your hoster, if you could sustain their 100Mbps burst rate indefinitely, 90e6 * 100k would take ~250 hours to download. And they'd cut you off long before that.
Worrying about shaving a few milliseconds here and there using asynchronous DNS is just a drop in the ocean.
(*)They seem to range from the minimalist google at 8k, up to the commercial bloat of sky.com at 250k; but 100k is a good average of the few I looked at.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] |