Stamp_Guy has asked for the wisdom of the Perl Monks concerning the following question:

Hello
I was recently introduced to the DB_File module. I've been scouring the web looking for detailed implementation details but have been disappointed. I have read the section in the Llama book about it. I've done a supersearch and read everything I could find. I've read the documentation that comes with the module, and I've done a search on google. Basically I need the answer to some specific questions - particularily regarding DB_BTREE. Any help would be greatly appreciated. If anyone has had extensive experience using this module and wouldn't mind some questions, please /msg me! Thanks!

Stamp_Guy

Replies are listed 'Best First'.
Re: Question regarding DB_File
by bikeNomad (Priest) on Jul 07, 2001 at 00:37 UTC
    You can look at the BerkeleyDB module documentation; the tied hash implementation is more or less the same as DB_File, except that it gives you full control over the database. Also, you can look at the documentation of the BerkeleyDB API itself, which can give you an idea what it can do.

    I don't know what you mean by "display in the order"; it would depend on how you're doing the displaying. Do you mean that keys(%myDB) will return the order of insertion?

    If you're interested in the order of insertion, you may want to look at the DB_RECNO type instead. The BTree will order by key value.

    Look at Sleepycat's documentation on BerkeleyDB for details on how this all works. And look into using the BerkeleyDB module instead of DB_File if you want finer-grained control over ordering, duplicate keys, locking, etc.

      I'm going to describe the catalog number system in detail:
      There are several types of catalog numbers:
      • Numbers with no leading or trailing alphabetical characters (ie: 219)
      • Numbers with leading alphabetical characters (ie: C18)
      • Numbers with trailing alphabetical characters (ie: 219a)
      • Numbers with leading and trailing alphabetical characters (ie: C3a)
      • Numbers that are a range of other numbers (ie: 751-751a or 756-765)
      When sorted a list of the following catalog numbers:
      219, C18, 291a, C3a, 756-765, C21/22, 219a, 756, 291, C20, C30

      Should look like this:

        219
      • 219a
      • 291
      • 291a
      • 756
      • 756-765
      • C3a
      • C18
      • C20
      • C21/22
      • C30
      I can't have normal numbers out of order (as an alphabetical sort would do, with 10 coming before 9), nor can I change the catalog numbering system. How is someone supposed to be able to sort a list like that? It's totally beyond me! I'm really doubting it's possible. I would love for someone to prove me wrong!!!

      Stamp_Guy

        It is possible to write a sort function that will produce any arbitrary order your heart desires. However, it might be difficult to write, and the resulting program might not be very efficient. A powerful technique is to transform the keys into something that can be sorted alphabetically. This is much easier to think about, and lets you take advantage of the special key-reduction optimizations that are built into the newer versions of DB_File.

        If you know that the catalog numbers never have more than a certain number of digits, you can make them sortable by adding leading zeroes to numbers, like this:

        • 9 -> 009
        • 123a -> 123a
        • C30 -> C030
        • C21/22 -> C021/022
        ...and so on. You can make this transformation with a simple statement like:
        $key =~ s/(\d+)/sprintf "%03d", $1/eg;
        Remember to keep a copy of the original key around so you can show it to your users, and they won't ever have to know what you've done.

        If you don't have a guarantee about how many digits can be in a catalog number, there are clever solutions... You could, for instance, store the number of digits followed by the actual digits themselves.

Re: Question regarding DB_File
by no_slogan (Deacon) on Jul 07, 2001 at 00:25 UTC
    A B-Tree file stores its keys in sorted order, so if you do this:
    $btree{"ccc"} = $foo; $btree{"aaa"} = $bar; $btree{"bbb"} = $baz; print join(" ", keys %btree), "\n";
    ... you'll get "aaa bbb ccc" out. (Assuming you're using the default sorting order in your btree.) Maintaining this sorted order is how B-Trees are able to locate a particular piece of information quickly. There's no easy way to get the keys out in insertion order, unless that happens to be the same as sorted order.

    The only time this is really useful is when your btree file is too big to do keys %btree on in the first place, though. Otherwise, it's not much better than sort keys %hash. If your file is that big, you'll need to either use each to walk through the whole thing, or DB_File::seq to find a particular range.

Re: Question regarding DB_File
by LD2 (Curate) on Jul 07, 2001 at 00:59 UTC
    Take a look at this document, it gives a good example of DB_Btree - as well as how to override the default sorting algorithm.