Hardware scalable web architecture

Replies are listed 'Best First'.
Re: Hardware scalable web architecture by BrowserUk (Patriarch) on Apr 10, 2006 at 07:41 UTC
There is a pretty good high level overview of the architecture Google use at (pdf) Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l]
Re: Hardware scalable web architecture by fergal (Chaplain) on Apr 10, 2006 at 09:49 UTC
"truly scalable" basically means paralellisable - that is each machine can operate independently of each other machine. There aren't so many interesting web apps where this is true. For example, anything where users can create content that needs to be visible by all other users will require a database that is read by all machines and this will be your bottleneck to scalability. An example of something that is truly scalable is a news site (excluding comments). You simply push the new content out to each machine and let them serve. If you need more capacity, you add more machines. They can even be in different parts of the world. You can mix the two situations to help with scaling. So for example you have 100 main web servers which weave together static and dynamic content. They don't talk to the database, that's left to the 20 dynamic content servers which generate HTML and pass it to the main web servers. They might also do caching etc. This way you limit the number of machines holding open DB connections. It doesn't matter so much how your static content grows, you could start serving videos instead of text, because that's handled by the main web servers and you can add lots more of those. You're still limited in how much dynamic content you can add because every piece of dynamic content comes from the DB. The "big boys" are lucky. Plain old web search requires no user data, all you need is a cluster of machines with the data. You can scale with user growth by simply throwing another cluster at it. Scaling with data growth is going to be the hard part.	[reply]
Re^2: Hardware scalable web architecture by tirwhan (Abbot) on Apr 10, 2006 at 10:21 UTC
One way to better scale database interaction is to set up slave DB servers which are used for querying the data. The master DB server is only used for inserting new data and replicating it to the slaves. Architecturally this is a lot easier than a truly replicated database setup (with multiple masters that can be used for both reading and writing), and several such solutions exist (see for example Slony for postgresql). fergal's point still applies though, not all applications can benefit from this kind of setup. So if you want to leave this option open for the future while building your webapp you should use different handles for reading and writing to the DB. Those can point to the same server during first deployment and later be changed to access the master and slave(s) when you need to scale. All dogma is stupid.	[reply] [d/l]
Re: Hardware scalable web architecture by wazoox (Prior) on Apr 10, 2006 at 10:49 UTC
You can store data in sessions, but you may need to be able to share the sessions between servers, either storing them in a database, or on some shared storage (NFS server). Load balancing can be tricky too, because you don't want the load-balancer itself to be the single point-of-failure... That's why most people use the simplest load-balancer, round-robin DNS (each new incoming connection resolves to a different server, and you have at least two DNSes, don't you ? ).	[reply]
Re: Hardware scalable web architecture by rhesa (Vicar) on Apr 10, 2006 at 14:11 UTC
I'm still in the progress of making our company's app fully scalable. The only outstanding issue is the file server: I have yet to find a good solution to mirror our nfs server. Our data is in de low terabytes at the moment, and growing by a couple of gigabytes per day. A typical rsync run takes a couple of hours, so that's not a real-time solution. Does anyone have good pointers on this? As for the rest, everything is redundant. We have 4 tiers: Static front-end Apaches mod_perl Apaches () Replicated MySQL servers A big fileserver () Actually, this layer also has a number of Apaches setup to do straight cgi, for those long-running requests like file upload. Here's some links I found useful in setting this up: The front-ends are clustered using RRDNS The tiers 1. and 2. are connected using Randomized Rewrite Maps and the Reverse Proxy setup The MySQL servers are replicated as described in their Howto So that still leaves me with the file server. I haven't been able to investigate clustered filesystems such as GFS, Coda and others yet, but there may be a solution there. And the other big thing I'm not trying to worry about just yet is how to mirror across data centers. I'll cross that bridge when we get there :)	[reply]
Re^2: Hardware scalable web architecture by wazoox (Prior) on Apr 10, 2006 at 14:46 UTC
To secure your NFS servers, see for instance : http://linux-ha.org/HaNFS . It's pretty easy but using a SAN is mandatory. Of course, if you're building an HA system you're probably using serious RAID arrays, SAN and the likes anyway.	[reply]
Re^3: Hardware scalable web architecture by rhesa (Vicar) on Apr 11, 2006 at 16:12 UTC
Thanks for that link! I'm not sure whether it will work in our setup, but I'll look into it. We're currently using an AoE (ATA over Ethernet) device (made by Coraid). But I have to admit that I don't know much about this stuff: it was all set up by our ISP.	[reply]
Re^4: Hardware scalable web architecture by wazoox (Prior) on Apr 11, 2006 at 20:24 UTC