in reply to Using tie to initialize large datastructures

It would be helpful if you posted some code and example data. This sounds as if you read the entire database into memory. If so, that will certainly slow things down.

Are you using DBI.pm to read the data? A few suggestions:

  1. Look into cached database connections
  2. Consider cached, prepared select statements with placeholders.
  3. Look at what DBI::bind() can do for you.
  4. Get ruthless with globals, replace them with lexicals which only hold data you need.
  5. Benchmark. No rule of thumb can replace actual performance measurements.

A tie class may be useful, but a few sub returning data, given a key, is likely to work as well. Of course, all this is speculative and not necessarily useful.

After Compline,
Zaxo

Update: Changed list to numbered format for reference.
Thanks for the extra info on your requirements. I'm sorry to admit that I'm unfamiliar with any of the mechanisms you cite (Apache::ASP, CORBA through the COPE modules). You don't appear to use either CGI.pm or DBI.pm. (Update2: I'm informed that htoug uses DBI.)
You might try #5 'Benchmark' right away, to see where the resource hogs are.
Given your security requirements, #4 is all the more important.
As a design issue, I'd suggest starting from the user interface and seeing how few SQL statements you need to support it.

  • Comment on Re: Using tie to initialize large datastructures

Replies are listed 'Best First'.
Re: Re: Using tie to initialize large datastructures
by htoug (Deacon) on Aug 08, 2001 at 12:44 UTC
    I'm definitly not trying to read the entire database into memory. That would take about 40GB, and we don't have that much available for each apache process!
    The system is a 3 tier system, with an apache frontend (written using Apache::ASP, handling the formatting of data, session handling etc), a set of application servers (communicating with apache using CORBA through the COPE modules), and the database (about 40 GB of data in ~800 table, the largest containg >130mill rows, access using DBI, DBD::Ingres {which I wrote} etc) - all on different machines. The database contains very sensitive data, so security is important.

    We have some (about 50-100) table that contain things like eg zip-code, department addresses, typecodes, and so on ad nauseam. Some are small, some are big, others huge - it varies.

    In the frontend code (on apache) we eg. need to create selectboxes, that let the user choose between different options, based on the content of the constant tables.

    A possibility would be to fetch the data everytime it is needed:

    my $zip = $zip_server->get_zip_codes(); print "selectbox-header"; for (@$zip) { print "selectbox line"; } print "selectbox-end";
    or something like that.

    This will take quite a while and soon you discover the need for caching the data. So you try something like:

    ...in common initialisation code... our $zip; $zip=$zip_server->get_zip_codes(); ...where the zip-code is needed... print "selectbox-header"; for (@$zip) { print "selectbox line"; } print "selectbox-end"

    This is fast, but it takes more and more memory as the number of constants rise. So the next version could be something like:

    ...in the common initialisation code... our $zip; sub zip_init { $zip = $zip_server->get_zip_codes() unless $zip; } ...at every use... zip_init(); print "selectbox-header"; for (@$zip) { print "selectbox line"; } print "selectbox-end";
    That is fast, easy and does not comsume unnessacry amounts of memory. the downside is that you have to remember to call the zip_init before you use $zip.
    Sometimes you forget, and spend an excessive amount of time scratching your head and trying to fathom what went wrong.

    So I would like something like:

    ...in initialisation section... our $zip; tie $zip .... # magic here sub ZIP::TIE::FETCH { # smoke and mirrors here $zip = $zip_server->get_zip_codes(); untie $zip; # and leave the data in $zip } ..and where we use it... print "selectbox-header"; for (@$zip) { print "selectbox line"; } print "selectbox-end";
    Note no zip_init, fetch calls. Just the plain ordinary access to a variable.

    At the first reference to the variable the tie magic clicks in and retrieves the data, and removes the magic, leaving the 'naked' variable.
    Giving

    • no need the remember the initialisation incantations (we all forget things too often)
    • no performance overhead

    Did that clarify what I need?

      I think you are trying to solve the wrong problem.

      First of all, gratuitous globals is a sign of a poor design. I would use an access function, and (depending on what made sense) I would have it memoize results. Much cleaner design, and your issue never arises. Unless your program is truly performance sensitive (the odds are very low that it is), trying to optimize before hand at the expense of maintainability is a losing game.

      However the second issue is technical. In the middle of calling an implementation of a tie, you don't have access to information about the tie. A tie just replaces a data structure with a wrapper around an object. But from the point of view of the object call, it is just an object call. You are not told what variable you are being called with, and said variable may not even be in any scope you can access. (Think about tying a lexical variable.)

      Now the technical issue I could find a hack around. But the maintainability issue makes me really not want to...