Avoiding too many DB queries

seaver has asked for the wisdom of the Perl Monks concerning the following question:

Dear all

I've profiled a script of mine only to find that the biggest culprit in slowing down the script, is the function below.

As you can see, I've several tables, each of which are connected, in 'descending' order, by unique IDs from the table above.

So every model as the correct atom id, every atom has the correct residue id, and every residue has the correct chain id.

Every line I parse from a file, has all the data for all four tables in the same line, so I can do all 4 tables in one function. But the problem is that I want to be able to NOT query for any of the unique Ids.

But, I have to write an application that will be able to resume progress if it crashes or if the computer needs booting etc. so if it goes over any of the same files, the data is already there in the database, but how can I speed up the whole function, because every call is worth 0.000637 seconds, and that's huge, especially regarding the sheer volume of calls:

%Time    Sec.     #calls   sec/call  F  name
32.16 20144.1603  31635687   0.000637     DBI::st::execute

here's the function:

sub addAtom{
    my $self = shift;
    my $pdb = shift;
    my $atom = shift;
    my $ch = $atom->chainId();
    my $dbh = DBI->connect($dbi,$u,$p,{'RaiseError' => 1});
    my $cid = $self->getCID($pdb,$atom->chainId(),$dbh);
    my ($rid,$aid,$mid,$res,$ato);
    if(!$cid){
        $dbh->do("INSERT INTO chain (pdb,chain) VALUES ('$pdb','$ch')"
+);
        $cid = $self->getCID($pdb,$atom->chainId(),$dbh);
    }
    if($cid){
        $rid = $self->getRID($cid,$atom->resNumber,$atom->resName,$dbh
+);
        if(!$rid){
            $res=$dbh->quote($atom->resName());
            $dbh->do("INSERT INTO residue (cid,rnumber,rname) VALUES (
+'$cid','".$atom->resNumber."',$res)");
            $rid = $self->getRID($cid,$atom->resNumber,$atom->resName,
+$dbh);
        }
 
        if($rid){
            $aid = $self->getAID($rid,$atom->atomName,$dbh);
            if(!$aid){
                $ato=$dbh->quote($atom->atomName());
                $dbh->do("INSERT INTO atom (rid,aname) VALUES ('$rid',
+$ato)");
                $aid = $self->getAID($rid,$atom->atomName,$dbh);
            }
             
            if($aid){
                $mid = $self->getMID($aid,$atom->model,$dbh);
                if(!$mid){
                    $dbh->do("INSERT INTO model (aid,model,x,y,z) VALU
+ES ('$aid','".$atom->model."','".$atom->x."','".$atom->y."','".$atom-
+>z."')");
                    $mid = $self->getMID($aid,$atom->model,$dbh);
                }
            }
        }
    }
    $dbh->disconnect();
}
[download]

Cheers
Sam

UPDATE:

Thanks for the replies, I had actually broached the subject of avoiding repeated connections in this thread.

I think the best solution to my problem is simply to use stored procedures, and also to keep a hash of the 'current' ids.

I have one more question though: Is it possible to get the perl DBI or MYSQL to return the newly created ID, so that I dont have to re-query for the id itself?

thanks
Sam

Comment on Avoiding too many DB queries Download Code

Replies are listed 'Best First'.

Re: Avoiding too many DB queries
by shemp (Deacon) on Jun 17, 2004 at 16:22 UTC

[reply]

Re: Avoiding too many DB queries
by iburrell (Chaplain) on Jun 17, 2004 at 16:26 UTC

Another optimization is to cache the results if you do the same lookups repeatedly. For example, keep a hash that maps atom name to aid. If the mapping is not found, call the database, else use the value.

Which database are you using? With some databases, it can't make sense to rely on the database to check the keys and report an error when a row already exists. Your code rolls back the whole transaction and goes on the next item.

[reply]

Re: Avoiding too many DB queries
by EvdB (Deacon) on Jun 17, 2004 at 17:04 UTC

sub dbi_connect {
  my $self = shift;
  return $$self{dbh} ||= DBI->connect($dbi,$u,$p,{'RaiseError' => 1});
}

sub dbi_disconnect {
  my $self = shift;
  return 1 unless $$self{dbh};
  my $dbh = $self->dbi_connect;
  $dbh->disconnect || die $dbh->errstr;
  $$self{dbh} = undef;
  return 1;
}
[download]

--tidiness is the memory loss of environmental mnemonics

[reply]
[d/l]
[select]

Re: Avoiding too many DB queries
by shemp (Deacon) on Jun 17, 2004 at 18:26 UTC

[reply]

Re: Avoiding too many DB queries
by Plankton (Vicar) on Jun 17, 2004 at 16:33 UTC

Plankton: 1% Evil, 99% Hot Gas.

[reply]

Re: Avoiding too many DB queries
by jeffa (Bishop) on Jun 18, 2004 at 13:05 UTC

Just a nitpick. Don't do this:

$dbh->do("INSERT INTO chain (pdb,chain) VALUES ('$pdb','$ch')");
[download]

$dbh->do("INSERT INTO chain(pdb,chain) VALUES(?,?)",undef,$pdb,$ch);
[download]

jeffa

L-LL-L--L-LL-L--L-LL-L--
-R--R-RR-R--R-RR-R--R-RR
B--B--B--B--B--B--B--B--
H---H---H---H---H---H---
(the triplet paradiddle with high-hat)

[reply]
[d/l]
[select]

Re^2: Avoiding too many DB queries

by seaver (Pilgrim) on Jun 18, 2004 at 17:23 UTC

I've got too many statements all over my code to go through and update it as you recommend...

Cheers
Sam

[reply]

Re^3: Avoiding too many DB queries

by eric256 (Parson) on Jun 18, 2004 at 17:42 UTC

It is more of a security issue. You no longer have to wory about what is in those variables because DBI is going to make sure they are safe and not trying to break anything. It also means you can prepare it before hand using placeholders, then execute it multiple times later with different values, I beleive in that case it DOES reduce the overhead.

___________
Eric Hodges

[reply]