Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Re: Re (tilly) 1: BerkleyDB versions 2.x , 3.x

by chorg (Monk)
on Apr 30, 2001 at 17:39 UTC ( [id://76579]=note: print w/replies, xml ) Need Help??


in reply to Re (tilly) 1: BerkleyDB versions 2.x , 3.x
in thread BerkleyDB versions 2.x , 3.x

Thanks - I'm readomg the docs today...

Basically we've got a clustered web server setup, but only one database server. I would like to distribute some of the data load, such as authentication for users, individual site data etc over more than one data management system. I'm not a great fan of DBMs' but when I saw what was possible with Berkeley 3.x, I was enthusiastic.

What did you ment when you said that the data is only directly accessible from one machine?

The locking gotchas that you know about - what are they?
_______________________________________________
"Intelligence is a tool used achieve goals, however goals are not always chosen wisely..."

  • Comment on Re: Re (tilly) 1: BerkleyDB versions 2.x , 3.x

Replies are listed 'Best First'.
Re (tilly) 3: BerkleyDB versions 2.x , 3.x
by tilly (Archbishop) on Apr 30, 2001 at 18:44 UTC
    The locking gotchas first.

    Do not follow the Cookbook and the old DB_File docs. Never flock the handle to the dbm. If you must use flock, flock an external lock-file.

    Next the one machine issue. Take a look at this list of what Berkeley DBis not. It is an access library. Furthermore it is an access library that requires shared memory. As they point out that makes it important to put Berkeley DB on a local filesystem.

    That means that a given dbm is only directly accessible from one machine. You can have a client-server relationship (eg LDAP) so that the data can be indirectly accessed from multiple machines though. I have never tried that. Plus since the library is mapping things into the process that is using it, recovering from unexpected application failure in a CGI environment is a non-trivial affair. (You *never* know when someone else is coming along, and race issues are a far bigger problem in a web environment than in traditional applications. When I last checked, admittedly a while ago, Berkeley DB was still catching up.)

    In your situation this means that you likely will want to limit your dbm usage to lifting read-only load. Unless your clustering solution allows you to reliably send a client back to the same machine until you have synchronized data, you really don't want to use it for read/write. You could, of course, stage read/write information to a local dbm and then transfer to a permanent record later.

    Another significant detail that I discovered at the same time. All of the transactional guarantees that people give you with databases? To get them to work with Linux you must be using Linux 2.4, and you must have your data on a raw IO partition. Otherwise there is a layer of buffering at the Linux filesystem level which means that the database does not really know what has and has not hit disk. For most purposes this does not matter, but if you have a hard reliability limit to hit, you should be aware of this. (I do not think that Linux is alone in having obscure limits like this, I just happen to know for that OS what they are.)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://76579]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (3)
As of 2024-04-24 13:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found