If you're dealing with flat, denormalized data spread among several servers, then what are the advantages over other techniques?

You could work directly with fixed-width data files with fixed-width index files on a clustered file system. This solution lets the files system handle redundancy, distribution across multiple servers, and fault tolerance. The storage portion is already written, and it can be very efficient. You'd just need to write file handling, data locking, and search routines.

OpenLDAP allows you to write to one server (with failover to another) and query as many different servers as you want round-robin. Some other LDAP servers allow more than one server to accept writes at a time. If your data is more hierarchical than relational, then using a hierarchical database like a directory service makes sense. Every benchmark I've done or read elsewhere shows OpenLDAP having the lunch of RDBMS systems on write-seldom, read-often data.

If you're using relational databases, why are you querying servers in sequence to see which has the data? A good hashing algorithm for which DB server to query could cut down on quite a bit of traffic. Set up three different hash functions for three different data points in your data row. Hash against all three for each piece of data that comes in, and store to all three back-end servers that row maps to for each write. Then, you have three copies of everything, spread evenly among different servers (assuming good hash functions are selected). Then, you can hash against whichever portion you're querying against and get the data back out of just one server. Replicate the front-end, but don't bother replicating the back-end data stores because they're already storing in triplicate. If a data store server fails, you can reconstruct what it held from the front-end tables and the other data stores pretty easily, and in fact it'd be pretty simple to write a general-case program with DBI to do just that. As you have to scale up, you must adjust the hash functions to map to more back-end servers and prepopulate those servers with the appropriate data from the existing servers, but I don't see how to balance the storage load on new servers with your method at all other than pulling random rows across.


In reply to Re^5: RFC: OtoDB and rolling your own scalable datastore by mr_mischief
in thread RFC: OtoDB and rolling your own scalable datastore by arbingersys

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.