I'm sure from time to time many of us had to deal with reading data from the database and caching it in a hash in order to be able to access it faster than if you had to invoke execute() on your statement handler object every time. This is especially useful in scripts that do a lot of work with some input data, compare it with corresponding records in the database, and/or lookup same records more than once during the course of program execution.

Currently, I deal with a script that compares a set of input data against records in our database. Since input data is coming from a 3rd party feed supplier, it is structured differently than our data is, which only adds to the complexity.


Just before I was ready to commit last revision of my (analyzer) script, I took another look at my code and -- "oh, my!" -- found a bunch of repeated code to do 'custom' caching, preparing SQL statements and etc. I thought it was too much trouble doing things that way.. so, I opted to fetch all of database data that I had to work with into special 'look-up' hashes (they didn't turn out to be huge though, so I didn't end up wasting much memory at run-time :).

One particular subroutine that I'd like to bring to your attention is this one that actually reads data from a database and stores it in a hash:
sub read_table_to_hash { my ($dbh, %args) = @_; # example: # statement = "select col1, col2, col3 from table foobar" # keys = qw(col2) # # if return data is: # # col1_val1, col2_val1, col3_val1 # col1_val2, col2_val2, col3_val2 # col1_val3, col2_val3, col3_val3 # # hash will look like this: # { # fields => ['col1','col3'] # data => { # col2_val1 => ['col1_val1','col3_val1'], # col2_val2 => ['col1_val2','col3_val2'], # col2_val3 => ['col1_val3','col3_val3'], # } # } # my $statement= $args{statement}; my $params_aref = $args{params}; # key fields (will allow easy hash key # lookup) my $keys = $args{keys} or die "keys not defined!"; my $sth = $dbh->prepare($statement); my $fields = $sth->{NAME}; $sth->execute(@$params_aref) or die $sth->errstr; # fetch all records my %data_hash; my $key = uc(pop @$keys); # assume a single key only (for now) @{$data_hash{fields}} = grep(!/$key/, @$fields); while (my $row = $sth->fetchrow_hashref()) { # store in hash foreach my $field (@{$data_hash{fields}}) { push(@{$data_hash{data}{$row->{$key}}}, $row->{$field}); } } return \%data_hash; }
Although it did fit the immediate purpose of my analyzer script, I now see that it wouldn't hurt to add a few features. One that I think of is allowing the use of more than one (lookup) key.

For example, as in many database queries, it sometimes is necessary to search for a record by more than one key. So, if a given column (say, CUSTOMER_NAME) happens to be indexed in the database, i would find associated record pretty quickly. This is pretty similar to how a hash would work if the same 'column' was used as a key in the resultant hash. While working with a single key is trivial (exactly what I have now), building a lookup hash with two or more keys may be a little tricky. For one, a hash table can only have one key. So, what would be the best way of implementing this?

A pretty straightforward way would be to simply use separate lookup hashes for each key. The only issue I have with this is having to duplicate a lot of data.

Suggestions?? ;-)

"There is no system but GNU, and Linux is one of its kernels." -- Confession of Faith

Replies are listed 'Best First'.
Re: using (lookup) hashes to store/cache database data
by gav^ (Curate) on Apr 17, 2002 at 05:55 UTC
    I've found that if I'm going to do a lot of data accesses and the data is pretty static then loading it all into a hash and saving it with Storable to disk can be a real time saver. This works really well if you are tweaking the output, for example I was creating HTML pages from MySQL and it took quite a few queries per page. I first started just dumping the data structure with Data::Dump on each page and then refining the output until I was happy. If I had to reload the data from the DB each time this could have been quite painful.

    gav^

Re: using (lookup) hashes to store/cache database data
by perrin (Chancellor) on Apr 17, 2002 at 12:59 UTC
    The only issue I have with this is having to duplicate a lot of data.

    If you have both hashes point to references to the same data structure, the only duplicate data will be the hash keys.

Re: Using Hashes to Cache Database Data
by tadman (Prior) on Apr 17, 2002 at 07:44 UTC
    If you're using something like mod_perl then you can just save the results of a DBI fetchall_hashref() call and use it instead. For things that change extremely rarely, like province/state abbreviations, country codes, and so forth, you can presumably cache this "forever", or at least until the next restart. Where your data changes slightly, you just expire it now and then using undef and let your resolver re-cache new data.

    The more daring approach would use something like DBIx::Cache which does seem to promise a lot, but with a word of warning that it is experimental and "for exploratory use only". A more robust version could be very useful indeed.