in reply to fast+generic interface for range-based indexed retrieval

BerkeleyDB cursors handle this exact issue. You just need to use the BTree option so that the records will be sorted.
  • Comment on Re: fast+generic interface for range-based indexed retrieval

Replies are listed 'Best First'.
Re^2: fast+generic interface for range-based indexed retrieval
by Zarchne (Novice) on Dec 12, 2008 at 06:12 UTC
    Aha, with DB_SET_RANGE.

    Otherwise, I was wondering if storing a bitmap of the keys used would be useful. Probably not.

      No, you don't need DB_SET_RANGE. You get the cursor on the key you want with DB_GET and then keep calling DB_NEXT until you're done.
Re^2: fast+generic interface for range-based indexed retrieval
by jae_63 (Beadle) on Dec 12, 2008 at 14:38 UTC
    Thanks for all the responses. In fact I am using BerkeleyDB's BTrees (and cursors) to get good performance, it's just that my code winds up being specialized for each database and rather ugly. My keys (to date) are usually floating point numbers which I format like %012.7f since these numbers will never exceed 9999. I perform the one-time database load using db_load.

    It sounds as though I can't do much better than my current solution and still obtain good performance. One thing that I can do to clean things up is use a Perl iterator.

    Perhaps I can set this up to include a few parameters and hooks to define the format of the query string. This would yield the generic interface that I'm looking for, and perhaps would be worth releasing as a module.

    The Bowtie remark was interesting; I'm well-acquainted with that relatively obscure package (as well as the equally obscure Dynamite), but don't think that the very cool and underutilized Burrows-Wheeler transform is applicable here.

    Thanks again to all ...

      My keys (to date) are usually floating point numbers which I format like %012.7f since these numbers will never exceed 9999.

      Do you need 7 decimal places of accuracy? If your application could get by with 6, you could multiply your floats by 1e6 and store them as integers--which would probably reduce space and speed things up a lot whatever DB you use.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        Yes, I do need all those decimal places sometimes. Using integers is a nice idea, though.