Re^3: Integrated non-relational databases ?

RDBMS can and do handle huge data sets. Was your GROUP BY grouping by an indexed column? Which RDMBS were you using for that? What kind of hardware were you using? That really seems a bit too drastic of a change, although I rarely have tables with more than 100,000 records. Is that a single GROUP BY, or is that the effect when you change a whole class of queries that overlap to use it? More servers and database replication is often the answer. Was it memory bound, processor bound, or IO bound?

The problem with handmade solutions and with anything tied closely to a certain language is that you're giving up large amounts of flexibility. SQL was designed specifically so that different programs in different languages could communicate to the same database and use the same data manipulation routines on the same data. You lose that if you're building it in some specialized database language that has no other support. While in some cases it's worthwhile to forgo convention and flexibility for performance, you have to be sure of what you're losing and what you're gaining. To be sure requires a lot more than a bit of ad-hoc testing on one example without accounting for possible machine deficiencies.

Comment on Re^3: Integrated non-relational databases ?

Replies are listed 'Best First'.
Re^4: Integrated non-relational databases ? by rootcho (Pilgrim) on Sep 29, 2007 at 01:13 UTC
I agree with many of the points you and the other mentioned. What I was glad to see in Mnesia was that you just add the next server and it use it. Yes you may still need to think of tehniques to partition your data, but you don't have to worry so much what scheme of replication to do, do you do cluster/can you. From what I read you in fact can play the role of the planner of the query with your own code. In general this is hard to impossible to do in RDBMS. I'm not saying Mnesia is best than say Mysql,potgresql ...etc. In fact I don't know how scallable Mnesia is in first place ;) As a side question, I need to implement if I may call it "slow/lazy queries", what I mean. A query that takes a long time to execute say from 5min to 1 hour, but doesn't take cpu and IO resources... so that the server continues to work as if there is nothing else happening. Do you have idea how such thing can be achieved or is it doable at all with today databases	[reply]
Re^5: Integrated non-relational databases ? by dragonchild (Archbishop) on Sep 29, 2007 at 01:55 UTC
The problem is that if your query takes more than 5 seconds, the snapshot of data it's looking at is out-of-date. So, you'll need to provide a solution to that problem. I've worked with queries that looked at millions of rows crossed with millions of rows and the longest I've ever had a query take was 15 seconds - that was ok because it was looking at archived data. Normally, queries shouldn't take more than 1 second. Taking longer usually means you've written the query wrong. Have you looked at the execution plan? My criteria for good software: Does it work? Can someone else come in, make a change, and be reasonably certain no bugs were introduced?	[reply]