Distributed DBM data storage

agoth has asked for the wisdom of the Perl Monks concerning the following question:

Hi,
I'm looking into developing a web based system that has to rely on two separate geographical locations (US/UK) for DBM file data. Each DBM file will be maintained by a server and the servers will communicate and update each other using IO::Socket / IO::Select via UDP.

Has anyone here managed this and have pointers or case studies they could direct me to??

My major concern is getting the data stores out of sync, and even with a update checking script running on cron, i haven't thought of a way of preventing corruption yet.
If updates happen to the same record at both sites at similar times, I'm stuffed...

cheers,

Comment on Distributed DBM data storage

Replies are listed 'Best First'.
Re: Distributed DBM data storage by jeroenes (Priest) on Jul 06, 2001 at 15:39 UTC
I would suggest a RDBM like Postgres or oracle etc etc. These programs have solved the distributing problems. If you really need to stick with single files (a situation I would try to avoid), you can try your luck with BerkeleyDB. That DB has a locking mechanism that could be used for these kind of things. This may or not may need some tinkering of the perl interface... Cheers, Jeroen	[reply]
Re: Re: Distributed DBM data storage by agoth (Chaplain) on Jul 06, 2001 at 15:50 UTC
I was shying away from Oracle because of expense and the network link we have in place, and MySQL because I 'assumed' maybe incorrectly that it couldnt. I'll have a look at the possibilities of Postgres. We're already using BerkeleyDB so the interface is in place and I suspect will stay..	[reply]
Re: Re: Re: Distributed DBM data storage by lhoward (Vicar) on Jul 06, 2001 at 16:17 UTC
I can't speak for Oracle, but MySQL's replication strategy is very robust, but not without limitations. With MySQL replication one DB serves as the master and others as children. Updates, inserts, and deletes should only be done on the Master (which pushes the changes down to the children), but selects can be done on any of the DBs. This works well because for many tasks because in many instances DB access is much more read-oriented than write-oriented. You can even do something similar with the DBD::Multiplex. It allows you to have a single database handle (from your program's point of view) that connects to multiple DB's on the back end. It'll use any of the DB's when doing a select, but any updates/inserts/deletes will be performed on all teh DB's.	[reply]
Re^3: Distributed DBM data storage by tadman (Prior) on Jul 06, 2001 at 16:10 UTC
MySQL and Postgres both have "replication" facilities, which probably do what you want to do, only they've been written and tested already, and are supported in a commercial capacity should you require it. Further, if you're using BerkleyDB, you could very easily switch to an RDBMS by tying your data to SQL instead of a flat file. Your program will hardly notice the difference. There are several examples of this sort of thing floating around, one of which is in the MySQL and mSQL book from O'Reilly. You won't be able to use any of the fancy SQL features without rewriting parts, of course, but at least this is optional.	[reply]
Re: Re^3: Distributed DBM data storage by agoth (Chaplain) on Jul 06, 2001 at 16:17 UTC
Re: Distributed DBM data storage by arhuman (Vicar) on Jul 06, 2001 at 16:04 UTC
If you don't want the overhead associated with a <FLAMEBAIT>real</FLAMEBAIT> RDBM, Why not let the FileSystem manage the distributed concurrent access (2 web system accessing ONE distributed file system) ? You may check Coda or GFS if interested... "Only Bad Coders Code Badly In Perl" (OBC2BIP)	[reply]


Pathologically Eclectic Rubbish Lister
	PerlMonks