in reply to Integrated non-relational databases ?

People have been pushing things like this for years. Seriously, it was an old idea a decade ago, when the first round of major object database vendors were going bankrupt.

Why don't they replace RDBMSes? My guess would be that part of it has to do with an exaggeration of the performance and scalability they offer. We've developed a very good understanding of how to do concurrency and data safety in large-scale RDBMSes, and I don't believe the object database vendors when they claim they've duplicated all of that.

A lot of it though is probably the thing you suggest is a weakness: SQL. With SQL, many ad hoc reporting tasks don't require the help of a programmer at all. Remember, SQL was invented so that business people could write their own reports, and some of them do. Some minimally trained HTML jockeys do too. When you lock up the data behind a Java or Erlang API, you lose something valuable.

By the way, there have been interfaces for Perl over the years to things like AceDB and ObjectStore.

  • Comment on Re: Integrated non-relational databases ?

Replies are listed 'Best First'.
Re^2: Integrated non-relational databases ?
by rootcho (Pilgrim) on Sep 26, 2007 at 17:45 UTC
    but from what I read in the recent news, most of the big sites are more and more abandoning the RDBMS systems in favor in most of cases of hand made solutions. Sometimes completely RRDBMS-less, sometimes a mix.
    What I'm saying is that current RDBMS can't handle very large data sets in a real-time environment.
    F.e. I was recently doing experiments with a very simple table and 10_000_000 records which fit into memory. At the moment I decided to use something else which is not lookup, let say GROUP BY, execution time is a minutes instead of milliseconds.

    That is why I was thinking if you are doing this uplifting in a domain "language structures", it would be easier I think to think of more efficient caching schemes, shredding and similar techniques, so you can stay in "millisecond range" easier even for very large datasets.
    Mind me this is just thought not some conclusion on which is best :). It is very hard to test such things in large scale and of course the requirements of every apps are different.
    http://radar.oreilly.com/archives/2006/04/database_war_stories_5_craigsl.html
    Look at the links at the end of the article too
      Yeah, I read that series when it came out. I don't see how it came to the conclusion that sites are not using RDBMes. Most people interviewed said they use MySQL and have figured out how to scale it. Google wrote something custom for some of their data, and one guy used Berkeley DB, but most of them use RDBMSes for most things. Even Google makes heavy use of MySQL.
      RDBMS can and do handle huge data sets. Was your GROUP BY grouping by an indexed column? Which RDMBS were you using for that? What kind of hardware were you using? That really seems a bit too drastic of a change, although I rarely have tables with more than 100,000 records. Is that a single GROUP BY, or is that the effect when you change a whole class of queries that overlap to use it? More servers and database replication is often the answer. Was it memory bound, processor bound, or IO bound?

      The problem with handmade solutions and with anything tied closely to a certain language is that you're giving up large amounts of flexibility. SQL was designed specifically so that different programs in different languages could communicate to the same database and use the same data manipulation routines on the same data. You lose that if you're building it in some specialized database language that has no other support. While in some cases it's worthwhile to forgo convention and flexibility for performance, you have to be sure of what you're losing and what you're gaining. To be sure requires a lot more than a bit of ad-hoc testing on one example without accounting for possible machine deficiencies.

        I agree with many of the points you and the other mentioned.
        What I was glad to see in Mnesia was that you just add the next server and it use it. Yes you may still need to think of tehniques to partition your data, but you don't have to worry so much what scheme of replication to do, do you do cluster/can you. From what I read you in fact can play the role of the planner of the query with your own code. In general this is hard to impossible to do in RDBMS.
        I'm not saying Mnesia is best than say Mysql,potgresql ...etc. In fact I don't know how scallable Mnesia is in first place ;)


        As a side question, I need to implement if I may call it "slow/lazy queries", what I mean.
        A query that takes a long time to execute say from 5min to 1 hour, but doesn't take cpu and IO resources... so that the server continues to work as if there is nothing else happening.
        Do you have idea how such thing can be achieved or is it doable at all with today databases