in reply to Re^5: RFC: OtoDB and rolling your own scalable datastore
in thread RFC: OtoDB and rolling your own scalable datastore
You could work directly with fixed-width data files with fixed-width index files on a clustered file system.
This is true. Or you could use tuple storage as chromatic pointed out. The thing I like about an RDBMS is that you get sort and filter for free, plus all the other things that come along with this type of data system (mentioned above). OpenLDAP would likely be better for hierarchical data, but I would point out that OtoDB is flexible enough to do hierarchies as well, but isn't good only for that.
If you're using relational databases, why are you querying servers in sequence to see which has the data?
Of your points, I like this one the best, because this is a problem I see with the design, and I'm still pondering it.
First, I'll say that my thought all along for reducing network traffic was to couple OtoDB with caching, i.e. memcached. Straightforward and powerful.
But it is inefficient to send a SQL command blindly to n servers, especially when using a WHERE clause that will only return n-y records (where y < n). For queries that return n or more records, I don't see a huge problem. In probably a lot of cases, records will easily be larger than n, and using an incremental insert, it's likely that data will exist on all servers for most queries.
In my examples above, where you have libraries and books, even a small library is likely to have 1000 books. It's doubtful that you'll have > 1000 servers, and if you did, you would probably have caching anyway.
But, given the case where you have a user profile and 50 servers, login is highly inefficient because you have to look on every server until you find the user and check his password. However, it wouldn't be hard to extend OtoDB (or add logic to your app), to simply store, on a single server, the username/password and a pointer to the data unit where the profile is located, reducing your queries from 50 to 2. Update: Or, couple OtoDB with a standard RDBMS server for some subset of the data, e.g. user login info.
But really, I just see this as caching, and I'm wondering if it should be part of OtoDB itself, or relegated to something that is already doing it, and would probably do it better. That being said, it still bothers me that in some cases querying each server is overkill. I'm still mulling, and your suggestions have definitely given me some more to think about.
As to adding servers to an existing set, this wouldn't automatically require rebalancing of data, but probably would in most cases. This is where using an RDBMS is helpful, because it wouldn't be terribly hard to create some backend processes that understand your data, and knows how to move some of it to the new server. OtoDB can't do this automatically, however.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^7: RFC: OtoDB and rolling your own scalable datastore
by mr_mischief (Monsignor) on Jul 22, 2008 at 17:49 UTC | |
by arbingersys (Pilgrim) on Jul 23, 2008 at 21:23 UTC | |
by mr_mischief (Monsignor) on Jul 23, 2008 at 21:41 UTC |