You could work directly with fixed-width data files with fixed-width index files on a clustered file system. This solution lets the files system handle redundancy, distribution across multiple servers, and fault tolerance. The storage portion is already written, and it can be very efficient. You'd just need to write file handling, data locking, and search routines.
OpenLDAP allows you to write to one server (with failover to another) and query as many different servers as you want round-robin. Some other LDAP servers allow more than one server to accept writes at a time. If your data is more hierarchical than relational, then using a hierarchical database like a directory service makes sense. Every benchmark I've done or read elsewhere shows OpenLDAP having the lunch of RDBMS systems on write-seldom, read-often data.
If you're using relational databases, why are you querying servers in sequence to see which has the data? A good hashing algorithm for which DB server to query could cut down on quite a bit of traffic. Set up three different hash functions for three different data points in your data row. Hash against all three for each piece of data that comes in, and store to all three back-end servers that row maps to for each write. Then, you have three copies of everything, spread evenly among different servers (assuming good hash functions are selected). Then, you can hash against whichever portion you're querying against and get the data back out of just one server. Replicate the front-end, but don't bother replicating the back-end data stores because they're already storing in triplicate. If a data store server fails, you can reconstruct what it held from the front-end tables and the other data stores pretty easily, and in fact it'd be pretty simple to write a general-case program with DBI to do just that. As you have to scale up, you must adjust the hash functions to map to more back-end servers and prepopulate those servers with the appropriate data from the existing servers, but I don't see how to balance the storage load on new servers with your method at all other than pulling random rows across.
In reply to Re^5: RFC: OtoDB and rolling your own scalable datastore
by mr_mischief
in thread RFC: OtoDB and rolling your own scalable datastore
by arbingersys
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |