in reply to Re (tilly) 2: millions of records in a Hash
in thread millions of records in a Hash

You are sure right that DBMs, like BDB, seem to fit the problem well.
On the otherhand maybe the seeker needs multiuser access or wants to remotely connect to his data.
We don't know that.
After all RDBMs provide much more services and usually fit smaller problems too, often also avoiding scaling problems.
So the general advise is often to use RDBMs where DBMs would suffice, that is not always based on a misunderstanding, maybe even more often based on the simple fact that people deal more with RDBMs than DBMs.
People are running multiuser operating systems for mere desktop usage after all. And many are happy with that, even so a single user system would suffice.

Another approach that often is forgotten is to write your own storage method, which given the seekers description doesn't seem to be out of hand and could well result in the most performant solution.
  • Comment on Re: Re (tilly) 2: millions of records in a Hash

Replies are listed 'Best First'.
Re (tilly) 4: millions of records in a Hash
by tilly (Archbishop) on Feb 25, 2002 at 07:04 UTC
    This is all true. But I am still cautious about telling people to use an RDBM when they either don't have the background to understand one (and I don't have energy to teach that), or they might understand both and have a good reason for using a dbm.

    As for writing your own storage method, I would strongly discourage people from doing that unless they already know, for instance, the internals of how a dbm works. And if someone comes back and asks me for that, my response will be that if they have to ask, the odds are that I can't teach them enough about the subject to do any better than they can do just by using the already existing wheel. And this is definitely true if they think they can build their wheel in Perl.

      tilly:
      "As for writing your own storage method, I would strongly discourage people from doing that unless they already know, for instance, the internals of how a dbm works. And if someone comes back and asks me for that, my response will be that if they have to ask, the odds are that I can't teach them enough about the subject to do any better than they can do just by using the already existing wheel.And this is definitely true if they think they can build their wheel in Perl."


      Given the precise knowledge of the data's signature, the seeker said something about 12 byte keys e.g., one can build very fine wheels using optimized algorithms, with perl or without.
      Naturally a starting point would be to look at a DBM implementation, but I wonder why in a Seekers of Perl Wisdom section one would recommend not going the hard way and learn a lot of stuff.

      And if you can't teach him/her, there might either be others who can or the seeker might just go his own way and find out himself.
        Given the precise knowledge of the data's signature, the seeker said something about 12 byte keys e.g., one can build very fine wheels using optimized algorithms, with perl or without.
        The odds are very, very high that the overhead of working in Perl would wipe out any possible win you would be able to get from knowing that the keys are 12 bytes. While the problem may sound like a fun challenge, this is an example of a common optimization mistake. If you are always looking for ways to rewrite and speed up bits of code, your overall program is almost guarateed to wind up slower than it would have been if you used good development practices. Why? Simply because you lose sight of the forest for the trees. You spend so long making your code unmaintainable that you are unable to spot the "low-hanging fruit" that inevitably provide the biggest improvements. See the sample section from Code Complete for more on this. (I recommend the whole book, but that is another story.) If you want more Perl specific optimization advice, try Re (tilly) 1: Optimizations and Efficiency.

        Naturally a starting point would be to look at a DBM implementation, but I wonder why in a Seekers of Perl Wisdom section one would recommend not going the hard way and learn a lot of stuff.
        Perhaps because the section is named Perl Wisdom and not Perl Masochists?

        Reinventing excellent wheels that you can get for free may be good stuff for an algorithms and data structures class. Understanding this stuff may be wonderful for your evolution as a programmer. But deciding to launch into that when you just need to get something done is stupid. And it is an important lesson to learn not to bother doing that, but to instead learn to reuse existing work when and where that is appropriate.

        See Modules Vs. Manual Coding for further discussion.

        And if you can't teach him/her, there might either be others who can or the seeker might just go his own way and find out himself.
        It would be nice if you were able to see that quote from my perspective and decide to apologize for the intended insult.

        FYI the quote that you were responding to was not an admission of ignorance on my part. Rather it was a comment on how great the gap is between asking the question, "How do dbms work?" and having a chance at writing one that outperforms a good one.

        If you want to disbelieve me, go ahead. In which case for someone with a Perl background and no CS background I would suggest starting at Bricolage: B-Trees and seeing how far you get. That will at least give you a key algorithm. But, for instance, that won't go into the details of how to really do it far enough to understand what any of the key parameters are that people want to tune in real dbms, let alone why they matter...

        Naturally a starting point would be to look at a DBM implementation, but I wonder why in a Seekers of Perl Wisdom section one would recommend not going the hard way and learn a lot of stuff.

        Two reasons, actually.

        1. People generally come to SOPW to get an answer to a problem so they can go and finish what they're doing with a minimum of fuss.
        2. While I have the background and training to learn how to "go the hard way and learn a lot of stuff", I have absolutely no inclination to do so. My time is worth more than attemting to solve a question that has already been solved. I'd much rather spend that time solving an unanswered question and put that solution out to CPAN. That would be a contribution to the community.

        ------
        We are the carpenters and bricklayers of the Information Age.

        Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.