AgentM has asked for the wisdom of the Perl Monks concerning the following question:

So here's the story, folks:

I'm mucking around with some config files that I know were tied with the standard AnyDBM_File (I wrote the code for portability). As you know, AnyDBM_File will choose a DBM module to use for the programmer (default NBDM, next is DB_File, etc.). To add to the confusion, some nitwit actually changed my code and used yet another dbfile format for "optimization". (yes, I really dug myself a grave by using AnyDBM_File in the first place...) Now it's my turn. Let's imagine that I have alot of these config files but they are coming from different machines and even different versions of Perl. All I want to do is to be able to open and manipulate them all reliably without too much confusion. But my dilemma comes around when one realizes that these config files are not using the same types of hash files!

So while AnyDBM_File makes it easy to create a dbfile without knowing it's type, it won't work backwards and tell me which type the file I wish to open is!

So, of course, the armed monk hits CPAN and comes up with File::MMagic, which seems like a direct match until he realizes that he would need to create his own external magic table (no support for common AND uncommon mixes of file types- discussed later)!

Being as lazy as he is, the resourceful monk tries opening any-old db with a certain dbfile module, hoping that success means that its the right format and failure otherwise. EEEEEF! Wrong again! Some modules are excellent at this while some (lowly) modules are happy to assign random binary filler data to a hash key. (Note: some such modules were written by some unnamed third parties and may not actually be available at CPAN. They also manage to completely ignore any header information, going straight for the data....)

So, the monk completely, exhausted, resorts to his wonderful friends at perlmonks.org and asks, "How can I detect a DBM file type where the above methods have failed and without a massive if-elsif block?"

Please keep in mind that File::MMagic would fail on alot of cases or that the ensuing table would become overly complicated and bloated....

Thanks in advance if you have any tips- even if they deal with File::MMagic or the UNIX file which didn't work too well either.

AgentM Systems nor Nasca Enterprises nor Bone::Easy nor Macperl is responsible for the comments made by AgentM. Remember, you can build any logical system with NOR.

Replies are listed 'Best First'.
Re (tilly) 1: Not a DBengine Question.
by tilly (Archbishop) on Nov 12, 2000 at 07:32 UTC
    The most reliable answer is going to be to have a utility that extracts the configuration into a standard text format which is portable. The utility is easy to write, but it will need to be run on the original machines.

    Sorry.

    In fact I came up with an interesting cautionary example.

    Consider DB_File. It allows you to create (among other things) a hash or BTree with a custom hashing or comparison function. Well one of the nice things about a BTree is that it is trivial to get keys to come back in sorted order. So use the following neat order:

    sub { use locale; $_[0] cmp $_[1]; }
    What does this do? It sorts in a locale-specific order. So if you are in Europe and have a few characters not in ASCII, it sorts them correctly.

    Now the point? There is no external tool which can guess this binary format. In fact all of the standard tools from Sleepycat will mess up on this. And better than that, if some user changes their locale then edits the data? Instant undetectible database corruption!

    (This is not just a problem for BerkeleyDB of course, many programs with binary formats are susceptible to the same problem. For instance a C++ program using the Roguewave string library can hit similar issues.)

    Which goes to show why it is important to make sure that you have backups of data in a *portable* format...