in reply to Re: Re (tilly) 1: BerkleyDB versions 2.x , 3.x
in thread BerkleyDB versions 2.x , 3.x

The locking gotchas first.

Do not follow the Cookbook and the old DB_File docs. Never flock the handle to the dbm. If you must use flock, flock an external lock-file.

Next the one machine issue. Take a look at this list of what Berkeley DBis not. It is an access library. Furthermore it is an access library that requires shared memory. As they point out that makes it important to put Berkeley DB on a local filesystem.

That means that a given dbm is only directly accessible from one machine. You can have a client-server relationship (eg LDAP) so that the data can be indirectly accessed from multiple machines though. I have never tried that. Plus since the library is mapping things into the process that is using it, recovering from unexpected application failure in a CGI environment is a non-trivial affair. (You *never* know when someone else is coming along, and race issues are a far bigger problem in a web environment than in traditional applications. When I last checked, admittedly a while ago, Berkeley DB was still catching up.)

In your situation this means that you likely will want to limit your dbm usage to lifting read-only load. Unless your clustering solution allows you to reliably send a client back to the same machine until you have synchronized data, you really don't want to use it for read/write. You could, of course, stage read/write information to a local dbm and then transfer to a permanent record later.

Another significant detail that I discovered at the same time. All of the transactional guarantees that people give you with databases? To get them to work with Linux you must be using Linux 2.4, and you must have your data on a raw IO partition. Otherwise there is a layer of buffering at the Linux filesystem level which means that the database does not really know what has and has not hit disk. For most purposes this does not matter, but if you have a hard reliability limit to hit, you should be aware of this. (I do not think that Linux is alone in having obscure limits like this, I just happen to know for that OS what they are.)

  • Comment on Re (tilly) 3: BerkleyDB versions 2.x , 3.x