I'm sure from time to time many of us had to deal with
reading data from the database and caching it in a hash
in order to be able to access it faster than if you
had to invoke execute() on your statement handler object
every time. This is especially useful in scripts that do
a lot of work with some input data, compare it with corresponding
records in the database, and/or lookup same records more
than once during the course of program execution.
Currently, I deal with a script that compares a set of input
data against records in our database. Since input data
is coming from a 3rd party feed supplier, it is structured
differently than our data is, which only adds to the complexity.
Just before I was ready to commit last revision of my (analyzer) script,
I took another look at my code and -- "oh, my!" -- found a bunch of
repeated code to do 'custom' caching, preparing SQL statements and etc.
I thought it was too much trouble doing things that way.. so, I opted
to fetch all of database data that I had to work with into special 'look-up'
hashes (they didn't turn out to be huge though, so I didn't end up wasting
much memory at run-time :).
One particular subroutine that I'd like to bring to your attention is this one
that actually reads data from a database and stores it in a hash:
sub read_table_to_hash {
my ($dbh, %args) = @_;
# example:
# statement = "select col1, col2, col3 from table foobar"
# keys = qw(col2)
#
# if return data is:
#
# col1_val1, col2_val1, col3_val1
# col1_val2, col2_val2, col3_val2
# col1_val3, col2_val3, col3_val3
#
# hash will look like this:
# {
# fields => ['col1','col3']
# data => {
# col2_val1 => ['col1_val1','col3_val1'],
# col2_val2 => ['col1_val2','col3_val2'],
# col2_val3 => ['col1_val3','col3_val3'],
# }
# }
#
my $statement= $args{statement};
my $params_aref = $args{params};
# key fields (will allow easy hash key
# lookup)
my $keys = $args{keys}
or die "keys not defined!";
my $sth = $dbh->prepare($statement);
my $fields = $sth->{NAME};
$sth->execute(@$params_aref) or die $sth->errstr;
# fetch all records
my %data_hash;
my $key = uc(pop @$keys); # assume a single key only (for now)
@{$data_hash{fields}} = grep(!/$key/, @$fields);
while (my $row = $sth->fetchrow_hashref()) {
# store in hash
foreach my $field (@{$data_hash{fields}}) {
push(@{$data_hash{data}{$row->{$key}}}, $row->{$field});
}
}
return \%data_hash;
}
Although it did fit the immediate purpose of my analyzer script,
I now see that it wouldn't hurt to add a few features. One that
I think of is allowing the use of more than one (lookup) key.
For example, as in many database queries, it sometimes is necessary to search
for a record by more than one key. So, if a given column (say, CUSTOMER_NAME)
happens to be indexed in the database, i would find associated record pretty quickly.
This is pretty similar to how a hash would work if the same 'column' was
used as a key in the resultant hash. While working with a single key is trivial
(exactly what I have now), building a lookup hash with two or more keys
may be a little tricky. For one, a hash table can only have one key.
So, what would be the best way of implementing this?
A pretty straightforward way would be to simply use separate lookup hashes for each
key. The only issue I have with this is having to duplicate a lot of data.
Suggestions?? ;-)
|
"There is no system but GNU, and Linux is one of its kernels." -- Confession of Faith
|