Dr. Mu has asked for the wisdom of the Perl Monks concerning the following question:

My first major Perl program, written several years ago, is an online database query system for a 12K race's runner registration. There is one master flat-file database that exists offline and which is manipulated by several offline scripts, including inactive participant pruning and incoming registration data entry. The data gets to the online system by uploading the entire zipped database to a Perl CGI script which extracts the info needed for online queries and indexes it along several dimensions for rapid access. The abbreviated dataset is stored online in a format similar to the master file.

Until now, all changes to the master database have had to occur offline, principally through the data entry script. This can be inconvenient, for example, when someone calls during the registration period and asks that their address or the spelling of their name be changed, since it means contacting the data entry person to update the master database. Consequently, these requests too often fall through the cracks.

I'm contemplating transferring the master database to the online server, so multiple entities can update it as needed, either directly through online transactions for small changes, or by synchronizing clones of the master database for larger changes. Due to the possibility of synchronization conflicts, this could be non-trivial to program from scratch.

I know there are commercial and freeware database systems (like MySQL) that would be of great benefit in this endeavor. Unfortunately my hosting service charges quite a bit extra for an account with MySQL access, so I'd like to consider other alternatives first. They are very good about installing requested modules, though, and even allow private libraries for those modules without universal appeal. Hence (after this long-winded intro) my question: Does anyone have experience with any Perl modules which provide record-level or (preferably) field-level synchronization support for flat-file databases? If so, are there any that you would recommend?

Many thanks!

Replies are listed 'Best First'.
Re: Database Synchronization
by PodMaster (Abbot) on Dec 03, 2002 at 08:29 UTC
    Don't bother with "flat-file databases", just use BerkeleyDB. It supports (i think) everything you need.

    Start with the Berkeley DB Tutorial and Reference Guide, especially with What is Berkeley DB? and What Berkeley DB is not, and convince yourself it's what you need, then get it ;)

    update: A always, BerkleyDB's simpler cousin DB_File is also available ;)(and this might be interesting as well)


    MJD says you can't just make shit up and expect the computer to know what you mean, retardo!
    ** The Third rule of perl club is a statement of fact: pod is sexy.

      Thank you for the pointer to Berkeley DB. Having read portions of the tutorial, I can assure you that knowing about this when I began the project would've saved me hours of coding! And yet this one quote from that tutorial gives me pause:
      Berkeley DB replication groups consist of some number of independently configured database environments. There is a single master database environment and one or more client database environments. Master environments support both database reads and writes; client environments support only database reads.
      This speaks directly to my synchronization issue, as it will be necessary to modify a client (clone) database offline for later synchronization to the master. This is necessary during data entry, for example, since our data entry person is a volunteer working at home with a dial-up internet connection. For her to do her work while maintaining the persistent connection required by a purely transactional system is expecting too much. Better, would be for her to have a clone of the master database which gets modified. When she connects to the server, those modifications would get uploaded and any changes made in the interim to the master database would get downloaded. Synchronization conflicts occur when the same record (or field) gets changed in two different copies of the database. They can either be logged for later human intervention or dealt with algorithmically.

      What I was hoping for was a module which automated the synchronization bookkeeping with hooks to the main program for dealing with conflicts. One that also includes journaling for reversing changes would be a plus.

        I've not seen anything dealing with that issue (and I like to keep things as simple as possible).

        Here's what I see happening. Your data entry person has a read-only copy of the database, and any changes she makes do not get recorded in the database, but a file which she then uploads to your server, and which then hopefully gets incorporated into the master database.

        You can have a special program on the server to control this, where the data-entry person can chose to which of the pending updates get entered into the master.

        You can build your own journaling system around that, or you can build one from the logs.

        I'd like to see what you come up with ;)


        MJD says you can't just make shit up and expect the computer to know what you mean, retardo!
        ** The Third rule of perl club is a statement of fact: pod is sexy.