comment on

I'm sure from time to time many of us had to deal with reading data from the database and caching it in a hash in order to be able to access it faster than if you had to invoke execute() on your statement handler object every time. This is especially useful in scripts that do a lot of work with some input data, compare it with corresponding records in the database, and/or lookup same records more than once during the course of program execution.

Currently, I deal with a script that compares a set of input data against records in our database. Since input data is coming from a 3rd party feed supplier, it is structured differently than our data is, which only adds to the complexity.

Just before I was ready to commit last revision of my (analyzer) script, I took another look at my code and -- "oh, my!" -- found a bunch of repeated code to do 'custom' caching, preparing SQL statements and etc. I thought it was too much trouble doing things that way.. so, I opted to fetch all of database data that I had to work with into special 'look-up' hashes (they didn't turn out to be huge though, so I didn't end up wasting much memory at run-time :).

One particular subroutine that I'd like to bring to your attention is this one that actually reads data from a database and stores it in a hash:

sub read_table_to_hash {
  my ($dbh, %args) = @_;

  # example:
  # statement = "select col1, col2, col3 from table foobar"
  # keys = qw(col2)
  #
  # if return data is:
  # 
  # col1_val1, col2_val1, col3_val1 
  # col1_val2, col2_val2, col3_val2 
  # col1_val3, col2_val3, col3_val3 
  #
  # hash will look like this:
  # {
  #  fields => ['col1','col3']
  #  data => {
  #           col2_val1 => ['col1_val1','col3_val1'],
  #           col2_val2 => ['col1_val2','col3_val2'],
  #           col2_val3 => ['col1_val3','col3_val3'],
  #  } 
  # }
  #

  my $statement= $args{statement};
  my $params_aref = $args{params};

  # key fields (will allow easy hash key
  # lookup)
  my $keys = $args{keys} 
    or die "keys not defined!"; 

  my $sth = $dbh->prepare($statement);
  my $fields = $sth->{NAME};
  $sth->execute(@$params_aref) or die $sth->errstr;

  # fetch all records
  my %data_hash;
  my $key = uc(pop @$keys); # assume a single key only (for now)
  @{$data_hash{fields}} = grep(!/$key/, @$fields);

  while (my $row = $sth->fetchrow_hashref()) {
    # store in hash

    foreach my $field (@{$data_hash{fields}}) {            
      push(@{$data_hash{data}{$row->{$key}}}, $row->{$field});
    }
  }

  return \%data_hash;

}
[download]

Although it did fit the immediate purpose of my analyzer script, I now see that it wouldn't hurt to add a few features. One that I think of is allowing the use of more than one (lookup) key.

For example, as in many database queries, it sometimes is necessary to search for a record by more than one key. So, if a given column (say, CUSTOMER_NAME) happens to be indexed in the database, i would find associated record pretty quickly. This is pretty similar to how a hash would work if the same 'column' was used as a key in the resultant hash. While working with a single key is trivial (exactly what I have now), building a lookup hash with two or more keys may be a little tricky. For one, a hash table can only have one key. So, what would be the best way of implementing this?

A pretty straightforward way would be to simply use separate lookup hashes for each key. The only issue I have with this is having to duplicate a lot of data.

Suggestions?? ;-)

"There is no system but GNU, and Linux is one of its kernels." -- Confession of Faith

In reply to using (lookup) hashes to store/cache database data by vladb

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.