Here is what I mean.
A traditional relational database works on a client-server
model. There is a server process running, and clients
talk to the server. By contrast Berkeley DB does not
run this way (though they likely have that as an option
by now). Instead each process that wants to use the
data loads up the access routines, attaches to a shared
memory segment, and proceeds to fetch data. The shared
memory segment is how simultaneously connecting processes
cooperate and make sure that if one is writing, and
another is reading, that you don't have problems.
This imposes three big limitations.
The first is that all processes connecting to the database
have to be on the same machine, connecting to the same
shared memory segment.
The second is that you cannot have much of a security model
for your data. Each client has direct access to all of
the data if it wants it.
The third is that you need to externally manage when you
allow processes to connect. For instance if someone kills
a process that is interacting with the dbm, the dbm is
likely left in an inconsistent state. There is no way
for anyone to detect this automatically. To recover from
it you need to make sure that nobody is going to connect
to the database, then proceed to have a single process
repair it. While the facilities for that exist in the
library, in a web environment it is up to you to make
sure that all web processes will coordinate on a single
lock-file to know not to attempt to access the database.
The client server model involves a lot more up front
overhead, but suffers from none of the above deficiencies.
The third in particular is why when I investigated dbms a
couple of years ago, I decided that a dbm was not
appropriate for any critical data in a web environment. |