I've been playing around with Google's App Engine recently, and blogging about the datastore a little. It turns out they prefer their data denormalized -- which was kind of hard for me to adjust to at first, I must admit. Amazon's SimpleDB is the same way.

I've gotten used to the idea, however, at least for some application cases. Not only that, it got me to wondering what would happen if you used a standard RDBMS like MySQL and denormalized your data? Would it scale easily? Would its performance be good?

I pursued the idea and came up with OtoDB::MySQL, a Perl module that provides a simple interface for using data across multiple MySQL servers. It forces you to structure your data using one-to-one relationships, and because of this, makes scaling (adding additional data servers) a straightforward proposition. As it currently works, OtoDB::MySQL simply reads and writes data incrementally on each data server in a given list of servers, effectively striping data across n data units.

The "spec" can be downloaded here (pdf, 106k). Pretty much every thought I've had on the subject to date (correct or not) is in this document.

Here's the OtoDB 0.01 source (tar.gz).

use OtoDB::MySQL; # Logic to iterate through units is the responsibility of the user. # These structures are used for the round-robin approach in this # example. my $curr_unit = 0; my @units = ( # unit, connection string, username, password ['srv1','DBI:mysql:otodb:localhost','root',''], ['srv2','DBI:mysql:otodb:10.1.6.212','root','sandwich'] ); # Instantiate the object. # next_unit() handles iteration over the array of servers above. # In a 'real-world' example, you might delegate these duties to an # external server, a global library, etc. my $m = OtoDB::MySQL->new( units=> \&next_unit ); # Create the data models. # 'fields' are just MySQL column names/types $m->create( model=>'library', fields=>{ libid=>'int not null', name=>'varchar(30)', street=>'varchar(50)', city=>'varchar(30)' } ); # The 'oto_ref' column creates a link to the 'library' model. # All references are required fields (i.e. 'not null') $m->create(model=>'book', fields=>{ title=>'varchar(30)', author=>'varchar(30)' }, oto_ref=>'library' ); # Now, we add some data # $key = [ entity, unit (server), unique ID of record ] my $key = $m->add( entity=>'library', data=> { # genid() creates a universal 'user' ID libid=> $m->genid( @{$units[0]} ), name=> 'Library A', street=> '131 Street', city=> 'City One' } ); # Link to 'library' using $key from previous add() $m->add( entity=>'book', data=> { title=>'of mice and men', author=>'steinbeck' }, oto_ref=>$key ); # Add books to a second library # Use first server as the ID server. Using a single server makes sure # that user data IDs will be unique across all units. my $libid = $m->genid( @{$units[0]} ); # Combine add() calls to satisfy 'oto_ref'. $m->add( entity=>'book', data=> { title=>'of mice and men', author=>'steinbeck' }, oto_ref=> $m->add( entity=>'library', data=> { libid=> $libid, # unique 'user' ID for libary B name=> 'Library B', street=> '111 Street', city=> 'City the Third' }) ); $m->add( entity=>'book', data=> { title=>'winter of our discontent', author=>'steinbeck' }, oto_ref=> $m->add( entity=>'library', data=> { libid=> $libid, # unique 'user' ID for libary B name=> 'Library B', street=> '111 Street', city=> 'City the Third' }) ); # Query data # Data is conceptually 'flat', or one-to-one. The 'entity' # parameter is basically a hint. The '<' means that the entity # to the right is a sub-entity of the entity to the left. Consider: # entity => 'library < book < item_detail' # Because 'book' is a many to 'library', save processing cycles # by NOT reducing. my @ents = $m->get( entity=> 'library < book', fields=> 'library.city, library.name, library.street, book.title', filter=> 'library.name like ?', params=> ['%two%'], order=> 'book.title asc', reduce=>'no' # 'reduce' assumed if omitted ); # @ents is an array of lightweight objects - OtoDB::MySQL::Entity foreach my $ent (@ents) { print $ent->field('name') . "\t" . $ent->field('city') . "\n"; print "data:\t" . join('..', $ent->row), "\n"; } # Set 'reduce' to 'yes' to eliminate duplicates due to data being # spread through multiple units. Generally, this will only be # necessary for 'one' entities, i.e. entities that have # duplicates. @ents = $m->get( entity=> 'library', fields=> 'library.name, library.street', order=> 'library.name desc', reduce=>'yes' ); foreach my $ent (@ents) { print $ent->field('name') . "\t" . $ent->field(1) . ":\n"; print "\tdata:\t", join('..', $ent->row), "\n\n"; } # 'reduce' is specifically turned off. You will see duplicates from # each unit @ents = $m->get( entity=> 'library', fields=> 'library.name, library.street', order=> 'library.name asc', reduce=>'no' ); foreach my $ent (@ents) { print $ent->field('name') . "\t" . $ent->field(1) . ":\n"; print "\tdata:\t", join('..', $ent->row), "\n\n"; } # Duplicating an entity # dup() is given a filter expression, and returns the first entity # it finds in the data store matching that expression. Here we # add another book based on our 'library ID'. $m->add( entity=>'book', data=> { title=>'grapes of wrath', author=>'steinbeck' }, oto_ref=> $m->dup( entity=>'library', filter=>'library.libid = ?', params=> [$libid] # remember this guy from above? ) ); # Update # 'fields' is a collection of the column names and the new # values you want to replace. $m->update( entity =>'library < book', fields=>{ # use fully qualified field names 'library.name'=>'new lib a', 'book.title'=>'OFMICEANDMEN' }, filter=>'library.name like ? and book.author like ?', params=>['%two%', 'steinbeck'] ); # Deleting # Delete based on a filter. Note: OtoDB does a check to see # if there are any references from or to and entity, if you # pass a single entity as a param. It dies if any exist. $m->del( entity =>'library < book', filter=>'library.name like ?', params=>['%one%'] ); # Delete book matching title at a given library $m->del( entity=> 'library < book', filter=> 'library.name like ? and book.title like ?', params=> ['%lib%', '%winter%'] ); # Increment through units, round-robin. A required subroutine. # A reference to this routine is passed to the $m object upon # instantiation, and it's how OtoDB::MySQL gets the next unit for # data operations. sub next_unit { # get current my $n = $curr_unit; #increment $curr_unit = ++$curr_unit > $#units ? 0 : $curr_unit; return $units[$n]; }

I find it hard to get a feel for how an idea really works until I've actually implemented something in it. So I made a rudimentary Wiki application called RubberWiki (included in the source above, see the rubwiki folder), based on OtoDB::MySQL. In my testing it was trivial to move from a single MySQL server to two (one config file edit), and query performance was good against both servers.

I think that querying -- OtoDB::MySQL->get() -- is currently where the bottleneck will occur as the number of data servers reaches a certain quantity. Some things that I thought would improve this:

I'd like to hear your thoughts. Thanks.

A blog among millions.

Replies are listed 'Best First'.
Re: RFC: OtoDB and rolling your own scalable datastore
by samtregar (Abbot) on Jul 14, 2008 at 21:52 UTC
    Not only that, it got me to wondering what would happen if you used a standard RDBMS like MySQL and denormalized your data? Would it scale easily? Would its performance be good?

    Yes and yes. This is pretty much exactly how we built our data-warehouse at a prior gig. You can read about it here:

    http://www.oreillynet.com/pub/a/databases/2007/04/12/building-a-data-warehouse-with-mysql-and-perl.html

    Of course, a data-warehouse isn't really all that similar to what you're proposing, it just happens to also be less-than-normalized. Personally I think these unrelational DBs are solutions in search of a problem - RDBMs have problems, sure, but I'm not seeing how stripping away so many useful features helps me!

    -sam

      Thanks for the link. Excellent article.

      Did you also stripe your data across multiple servers? Or just do the denormalization?

      A blog among millions.
        No, we didn't do any striping. I think that would be counter-productive for a data-warehouse, but it might be a fun experiment.

        -sam

Re: RFC: OtoDB and rolling your own scalable datastore
by chromatic (Archbishop) on Jul 14, 2008 at 19:47 UTC

    If you're not going to take advantage of the features of an RDBMS, why use an RDBMS? Use a distributed tuple store such as memcached instead.

      I would say because you get more out of an RDBMS than merely relational data structure. You also get:

      • High level and powerful query language (OtoDB uses this feature rather extensively)
      • Optimized storage and data retrieval engine (data indexes, etc)
      • A well-known paradigm
      • Additional power at the data server level like views, or clusters

      An RDBMS does more than just store and retrieve datasets. It also prepares data based on SQL commands you send it. Maybe I'm wrong, but it seems that memcached is just a storage mechanism. I.e. your code has to handle sorting/filtering. With an RDBMS you can ask the data servers to do this.

      There are probably some other reasons too, that I'm not thinking of. I should also point out that you are in fact using one relational feature, that of one-to-one relationships.

      A blog among millions.

        You're not going to get 65k queries per second out of an RDBMS if any of those queries have to do JOINs, and you're not going to put Varnish in front of it to speed it up even further. You can do that with memcached and CouchDB, for example.

        I'm by no means suggesting that there's anything wrong with JOIN, or that every persistent data storage needs to do what eBay or MySpace or Amazon.com does. Yet by the time your problem is that your database is a bottleneck, slapping something on top of the database doesn't seem like the right approach.

        (Similarly, I have the strong belief that if ActiveRecord could run on something other than an RDBMS, everyone would be a lot happier.)

      One problem with memcached is that it is hard to take a backup of it. I love the product, but I also like my data to be a little safer than that. Using memcached in front of a database gives me the best of both worlds.

      Another reason to use an RDBMS is that while your application may not take advantage of the relational structure, it is still there to be exported to and manipulated on reporting databases.

        Using memcached in front of a database gives me the best of both worlds

        This was kind of how I thought memcached fit best with OtoDB. As the code currently stands, you will only be able to add a certain number of data servers before querying them all incrementally will prove a bottleneck (at least if my intuition doesn't fail me here). memcached is the obvious choice to keep queries frosty as you add more servers.

        Also, to go along with your other point, an RDBMS also makes it easy to rebalance data, e.g. if you add new servers to a set of existing servers, and want to move some of the load over.

        A blog among millions.
Re: RFC: OtoDB and rolling your own scalable datastore
by CountZero (Bishop) on Jul 15, 2008 at 05:39 UTC
    Even after reading your blog I fail to understand the benefits of the denormalized datastore. It seems that you either will fall into the trap of having lots of redundant data lying around (and the nightmare of keeping them all updated) or that you will have to re-invent all the JOIN and "reference" logic at the level of your application rather than having it in the datastore.

    Perhaps for some well defined problems the denormalized datastore is faster and more appropriate than a RDBMS, but in general it seems a poor choice.

    Or are we blinded by the ubiquity of the RDBMS model "If all you have is a hammer, the whole world looks like a nail"?

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

      ... you either will fall into the trap of having lots of redundant data lying around (and the nightmare of keeping them all updated) or that you will have to re-invent all the JOIN and "reference" logic at the level of your application rather than having it in the datastore.

      The data is redundant, certainly, but this falls under the mantra of "disk (and CPU) is cheap so stop worrying about it and scale", which seems to be Google's approach with the App Engine. You are sacrificing space for easy scalability.

      As to re-inventing the JOIN logic at the level of your application, OtoDB::MySQL does not (it does do some extra work on ordering, which I cover in the document above). It opts for the redundant data.

      Perhaps for some well defined problems the denormalized datastore is faster and more appropriate than a RDBMS, but in general it seems a poor choice.

      Here I agree completely. So does Google and Amazon, from what I can tell. OtoDB::MySQL is not going to replace intensive relational applications, like financial or business logic. But if you were LiveJournal, and realize you don't care about JOIN on a specific set of data (users), but you do have the problem of scaling, an OtoDB::MySQL solution makes sense, at least as far as I can see.

      A blog among millions.
        Thanks for your insights.

        However, it is not the space requirements of the redundant data which worry me (with Terabyte disks within most people's reach that is no longer an issue), but rather keeping all those redundant data coordinated. It seems a big task which the RDBMS data-model has proven to solve.

        CountZero

        A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

      I think the argument for denormalisation is something like this: If you have a DB that is mostly read from (and more importantly rows are rarely updated), with lots of data and many concurrent requests, denormalisation is a way to improve performance by avoiding the overhead of a join. But as you point out, the price is added complexity when updating.

      It's a useful trick, but maybe not as useful as some think. It is, after all, just another way to optimise performance.

        It is a trick used in certain data-mining applications. You denormalize your "live" data into a number of additional tables and you do your data-mining queries on these denormalized tables. It is very fast as there are much less JOINs to be done when querying the data, but you are limited in what you can query. If you do this denormalization as a batch job when the database server is not very busy (say every night or week-end) and put the additional tables on another server optimized for read-access, I can see the benefit.

        The drawback is that the data becomes stale as soon as the live server updates and that not all queries are possible because you see the data through the mirror of the denormalization process. Sometimes this is even considered a benefit: "Now you can get at your data without having to write SQL." Yeah sure what a benefit ...

        CountZero

        A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James