in reply to Using tie to initialize large datastructures

I'm failing to see the compelling reason for designing your system in this fashion. I would look at doing something that has the following characteristics:
  1. Is encapsulated. You (the requesting script) do not know how the thing does what it does. All you care about is that it does what it promises to do, which is retrieve your data.
  2. Is fast. You want it to give you the data you request in a minimum amount of time.
  3. Is fast. You want it to load in a minimum amount of time.
  4. Is small. You want it to use the least amount of memory.
Sounds pretty tough, huh? Well, it's not. What you are looking for is not a datastructure, but an object.

YAY-US! You, too, can be a part of the O-O revolution, my friend! You can be HEE-ULLED of your pro-see-ju-rull ways!

What you're looking for is not something that loads all your data at once. That is waaay too slow to load, as I'm sure you've noticed already. You're looking for something that will cache data.

Now, others have suggested using DBI's caching, and that's good, or some sort of memorize, and that's good, too. I'm suggesting a third method, and that is to write an object that will hide your data-retrieval methods from yourself.

The basic concept is this - you instantiate this object. Then, when you need some data, you ask it for that data, and only that data. It will then check to see if it has it. If it doesn't, then it will go out to the database, get the data, store it within itself, then give it to you. Now, if you ask for that data again (for whatever reason), you will get the data immediately. You don't store the data ... this object does.

This method immediately allows for three things:

  1. You get rid of all those nasty globals. Now, all you have is a file-scoped lexical (the object) that will handle all your data needs.
  2. You can request the same data over and over and not incur a performance penalty. This means that your logic flow is cleaner and clearer. Your routines are more loosely coupled. (This is a good thing, in case you're wondering.)
  3. If you have more than one script that uses these data structures and, because you will, you end up changing them, you only change stuff in one place! Think about that - maintenance is made 10x easier. I know I always like that.

Now, you're gonna say "Well, I wrote the object, so I'm storing the data. You're just making my life more complicated."

My answer is simple - "No. You are the script that needs the data, or the general. The object is someone else, a quartermaster if you like. Even though the general puts the quartermaster in his position, he still has to requisition supplies through a known and agreed-upon method."

------
/me wants to be the brightest bulb in the chandelier!

Vote paco for President!

  • Comment on Why are you even bothering to do it that way?!?

Replies are listed 'Best First'.
Re (tilly) 1: Why are you even bothering to do it that way?!?
by tilly (Archbishop) on Aug 09, 2001 at 09:22 UTC
    Someone who is aware of how to write tie implementations had better be aware of how to write an object. And someone who wanted to avoid tie for performance reasons is going to be unlikely to want to use an object in the same place since the majority of the slowness in tie is in the method lookup.

    Note that Perl 5.8 is supposed to do a lot to fix the issue, but current versions of Perl have a performance headache while running OO code. (Not that that is normally an important thing to factor into a decision about whether or not to use an OO design...)

      Someone who is afraid of the performance penalties for using the "best" algorithms is someone who, in my humble opinion, is suffering from premature optimization. Until you have the system fully up and running and have run benchmarks and heard user complaints, you cannot know that method A is too slow! All you have is theory and, you know what? The best theory and $3.29+tx will get you a grande cafe mocha.

      ------
      /me wants to be the brightest bulb in the chandelier!

      Vote paco for President!

        I already pointed out that performance issues are usually irrelevant in the question of whether to choose an OO design. However algorithms are also often irrelevant in whether to choose an OO design. Generally speaking you can write the same algorithms in any programming paradigm. OK, so some of them come more naturally in one paradigm than another, but the ones that come most naturally in an OO design are often not particularly efficient. Quite the contrary in fact.

        For instance your memoizing object is the same algorithm as using an access function that memoizes its return. And using Dominus' Memoize module takes less setup work.

        Now programming paradigms have a huge effect on overall program design. They affect how easy it is to swap out one algorithm for another (in obvious and not so obvious ways). I am not saying that all paradigms are created equal. (OTOH I think that which one is "best" depends on circumstances.) But in general programming people who think that raw performance is a good reason for them to choose one paradigm over another are generally wrong. And likewise decent paradigms are not distinguished by what algorithms they can express. (Now ease of expression is a different story...)