knowmad has asked for the wisdom of the Perl Monks concerning the following question:

Hi fellow monks,

I've been wrangling with BerkeleyDB today trying to get a grip on how to do concurrent access with this database in order to provide a web interface to a legacy database. I am using BerkeleyDB.pm v0.25 with BerkeleyDB v4.2.

I'm using the DB_INIT_CDB and DB_INIT_MPOOL flags to Initialize locking for the Berkeley DB Concurrent Data Store. I found an example on the web from Paul M. who wrote BerkeleyDB which shows how to request a write lock. So far, so good.

My problem comes when I want to use this lock. There is not much in the BerkeleyDB.pm pod that explains using the DB_WRITECURSOR flag so I have resorted to the docs at sleepycat.com(1).

To my tired eyes, it seems that these docs are saying that I need to use a cursor to get a write lock. OK, but once I open a cursor, the docs say that I should not call db_put. OK, so let's try c_put. Unfortunately, that only appears to overwrite or duplicate the record under the cursor. I want to insert a new record. I do not see any flag available for doing that.

Could someone familiar with concurrency in BerkeleyDB please shed some light on this issue for me? Some example code would be excellent!

Update: I finally figured out the proper incantantation of flags needed to get BerkeleyDB to operate safely in a multi-process environment. See node #330510 "Testing DB concurrency with BerkeleyDB" for more details.

Many Thanks,
William

(1) Berkeley DB Concurrent Data Store applications

Replies are listed 'Best First'.
Re: Concurrent access with BerkeleyDB
by no_slogan (Deacon) on Feb 20, 2004 at 06:21 UTC
    Once you have locking initialized in your BerkeleyDB environment (which it sounds like you do), you can mostly forget about it. BerkeleyDB will automatically acquire and release locks as needed. You only need to give the DB_WRITECURSOR flag if you really want to use a cursor to write to the database. If you do all your writes with db_put or tied hash assignments, you don't need to worry about it.

      Thanks for the feedback. That's what I've read but it's not what I'm seeing in my tests which fork two children and try to do concurrent inserts. I can watch the database flying through the inserts and then eventually get hung and never (well at least within 15 minutes ;-) release.

      I even went to the trouble of enabling transactions but that did not prevent the lock from occurring. I have had some initial success with limiting the input via sleep 1 (too slow for effective testing) or select (undef,undef,undef,0.25) as described in the perlfaq8.

      This "solution" seems more like a hack for an inefficient locking system. Is this a limitation of BerkeleyDB.pm or in the BSDDB libraries? Or perhaps it's my poor understanding of forking which is at fault.

      My concern in getting this right is that I have a client with a large database of records which I'd hate to corrupt (I'm weaning them away from Python and corrupting the database would not be a good thing for proving Perl's effectiveness). I've seen several posts by MoveableType users who have had corruptions with BDB. Most of the data is recoverable but the downtime is both annoying and expensive (when it's a commercial business).

      Yes, I am considering moving the client to stronger database solution such as PostgreSQL or MySQL. I just wanted to fully exhaust this line of reasoning before recommending the switch. Sleepycat has done some nice work with BerkeleyDB but if it can't do reasonable concurrency then I won't risk using it in a production environment.

      I'll post my test script in Meditations in the hope that someone can point out the flaw in my logic or in my implementation of BerkeleyDB.

      Wm