mendicantMatt has asked for the wisdom of the Perl Monks concerning the following question:

Four years ago I made a DBM file (.dir/.pag pair) for an academic project, and now I can't get the data back out.

I'm guessing the problem is a mismatch between the DBM-handling routine it was written with and the ones I'm trying to read it with, but I don't know how to find out what to use that will be able to make proper sense of the .dir : the best I can get so far is seeing some truncated value data smashed into a key value, as returned.

At the time I was on an Athena system, which I believe is a version of Berkely Unix; now I'm working on my web-hoster's Linux system (ver2.2 or higher, as far as I can tell). Both have/had professional sysadmins maintaining the Perl installations.

The keys are short strings (usernames). The values are long strings with many values glued together with a '::' separator: the first value is a short string password, the second is a 3-digit ID number, the rest are time()stamps appended as a log.

Matching those logs-of-timestamps to the usernames is what I'm desperately in need of solving.

Some plain-text exploration in the large .pag file shows lots of that kind of data there, but also lots of wierd binary sections and jumbled ordering, as one might expect.

I've tried useing each of AnyDBM_File and it's referants: NDBM_File, DB_File, GDBM_File, ODBM_File, and SDBM_File separately (with diagnostics).

I'm using dbmopen (because the arguments to tie confuse me) with the third argument undef to prevent the existing files from being overwritten by an open attempt. The keys are being retrieved with each()

Here are the results:

AnyDBM, NDBM, and GDBM produce uncaught errors, period, end execution.
ODBM stops execution after a terse complaint of a segmentation fault.
DB_File stops execution after complaining (twice) of an unititalized value in subroutine entry at line 263 (from a perl 5.6.1 directory/.pm file).
SDBM manages to sometimes produce output: the first key (prefaced by a little garbage) includes both key and value info for the first entry I can see opening the file as plaintext, all the values are blank, and the subsequent keys are either large garbage (including some of the module code or its state) or completely blank, and eventually produce an out of memory error.

Please, oh please, ye wise monks, where can a Novice brother like myself turn to untangle this recalcitrant pair of DBM files?

Replies are listed 'Best First'.
Re: Unwinding an (unknown type) DBM
by tachyon (Chancellor) on Jul 06, 2004 at 02:20 UTC

    You may be able to get some joy simply using the strings(1) command, for example in my sendmail conf dir the access text file looks like:

    [root@mail]# cat access # Check the /usr/share/doc/sendmail/README.cf file for a description # of the format of this file. (search for access_db in that file) localhost.localdomain RELAY localhost RELAY 127.0.0.1 RELAY [snip]

    And using strings(1) on the access.db dbm file ( file(1) *may* tell you the type BTW) gets me back:

    [root@mail]# file access.db access.db: Berkeley DB (Hash, version 7, native byte-order) [root@mail]# strings access.db RELAY localhost.localdomain RELAY 127.0.0.1 RELAY localhost [snip] [root@mail]#

    You may just luck out and find the data dumps out in a format you can parse back into shape with a few lines of perl. If not good luck.

    cheers

    tachyon

      Thanks, monks, but I'm not out of the woods yet.

      file filename.pag and filename.dir both return a very terse answer: data

      strings does in fact strip out the garbage, but the records still aren't intact: e.g. fragments of the same reccord appear in many places, inconsistent breaks between records, etc.
      I do note that the key for each value is often found at the end of the relevant value string, if that's a clue for any demigods who read this.

      I've been trying this on my webhost's Liunx server instead of my own (recently rebuilt) Win32 system, hoping the implementations would differ less from the original Athena, but that also means I can go installing every SBDM_File module I can find.

      I've search the DBM answers here over the last 3 years or so, and anything I can get my hands on, still to no avail.
      I did see a partial reference to a WhichDBM module - does such a thing really exist?

      Since the size of the data isn't huge (dozens of timestamps for 32 people), I'm off to hack at what I got from strings, but it's still very ugly and (after much reading) it seems I'm not the only one who has this kind of question unanswered.
      "How do I find out what to use to unwind an old DBM file, original type unknown?"

      Thanks again, especially tachyon, for the helpful suggestions.

        You just may find that conv2gdbm does what you need. It certainly expects your .pag and .dir files as the input..... It is part of the gdbm distro As you note this is probably not an uncommon problem so perhaps this is the answer. The 'data' return from file(1) just means it recognises it as a binary file but has NFI what it is.

        cheers

        tachyon

        The "WhichDBM_File" reference was in the title of one of my postings. It was simply an effort to concoct a clever title calling attention to a frustration similar to yours. To the best of my knowledge, there is no WhichDBM_File module. Sorry for any false hopes I may have raised...
        I've been trying this on my webhost's Liunx server instead of my own (recently rebuilt) Win32 system, hoping the implementations would differ less from the original Athena, but that also means I can go installing every SBDM_File module I can find.

        If you have a shell account there's no reason you can't install modules locally. Just install the module into a directory you can write to and 'use lib' in your script to load it. You can make sure you loaded the right version by printing out %INC.

        -sam

Re: Unwinding an (unknown type) DBM
by PodMaster (Abbot) on Jul 06, 2004 at 02:00 UTC
    SDBM manages to sometimes produce output: the first key (prefaced by a little garbage) includes both key and value info for ...
    I suggest you get your hands on as many SDBM_File versions as you can and try them out until one works, or better yet, google for sdbm utilities.

    MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!"
    I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README).
    ** The third rule of perl club is a statement of fact: pod is sexy.

Re: Unwinding an (unknown type) DBM
by eserte (Deacon) on Jul 06, 2004 at 08:43 UTC
    According to the AnyDBM_File documentation, SDBM_File is not byte-order independent. So if you're on a little endian machine try it again on a big endian machine and vice versa.
Re: Unwinding an (unknown type) DBM
by theorbtwo (Prior) on Jul 06, 2004 at 02:09 UTC

    Since I don't see that you've mentioned it above, you may want to run the file command on the files in question -- they may tell you what dbm and possibly even what version of it you need.

    You should probably use this advice in conjunction with PodMaster's.


    Warning: Unless otherwise stated, code is untested. Do not use without understanding. Code is posted in the hopes it is useful, but without warranty. All copyrights are relinquished into the public domain unless otherwise stated. I am not an angel. I am capable of error, and err on a fairly regular basis. If I made a mistake, please let me know (such as by replying to this node).