comment on

Dear all

I've profiled a script of mine only to find that the biggest culprit in slowing down the script, is the function below.

As you can see, I've several tables, each of which are connected, in 'descending' order, by unique IDs from the table above.

So every model as the correct atom id, every atom has the correct residue id, and every residue has the correct chain id.

Every line I parse from a file, has all the data for all four tables in the same line, so I can do all 4 tables in one function. But the problem is that I want to be able to NOT query for any of the unique Ids.

But, I have to write an application that will be able to resume progress if it crashes or if the computer needs booting etc. so if it goes over any of the same files, the data is already there in the database, but how can I speed up the whole function, because every call is worth 0.000637 seconds, and that's huge, especially regarding the sheer volume of calls:

%Time    Sec.     #calls   sec/call  F  name
32.16 20144.1603  31635687   0.000637     DBI::st::execute

here's the function:

sub addAtom{
    my $self = shift;
    my $pdb = shift;
    my $atom = shift;
    my $ch = $atom->chainId();
    my $dbh = DBI->connect($dbi,$u,$p,{'RaiseError' => 1});
    my $cid = $self->getCID($pdb,$atom->chainId(),$dbh);
    my ($rid,$aid,$mid,$res,$ato);
    if(!$cid){
        $dbh->do("INSERT INTO chain (pdb,chain) VALUES ('$pdb','$ch')"
+);
        $cid = $self->getCID($pdb,$atom->chainId(),$dbh);
    }
    if($cid){
        $rid = $self->getRID($cid,$atom->resNumber,$atom->resName,$dbh
+);
        if(!$rid){
            $res=$dbh->quote($atom->resName());
            $dbh->do("INSERT INTO residue (cid,rnumber,rname) VALUES (
+'$cid','".$atom->resNumber."',$res)");
            $rid = $self->getRID($cid,$atom->resNumber,$atom->resName,
+$dbh);
        }
 
        if($rid){
            $aid = $self->getAID($rid,$atom->atomName,$dbh);
            if(!$aid){
                $ato=$dbh->quote($atom->atomName());
                $dbh->do("INSERT INTO atom (rid,aname) VALUES ('$rid',
+$ato)");
                $aid = $self->getAID($rid,$atom->atomName,$dbh);
            }
             
            if($aid){
                $mid = $self->getMID($aid,$atom->model,$dbh);
                if(!$mid){
                    $dbh->do("INSERT INTO model (aid,model,x,y,z) VALU
+ES ('$aid','".$atom->model."','".$atom->x."','".$atom->y."','".$atom-
+>z."')");
                    $mid = $self->getMID($aid,$atom->model,$dbh);
                }
            }
        }
    }
    $dbh->disconnect();
}
[download]

Cheers
Sam

UPDATE:

Thanks for the replies, I had actually broached the subject of avoiding repeated connections in this thread.

I think the best solution to my problem is simply to use stored procedures, and also to keep a hash of the 'current' ids.

I have one more question though: Is it possible to get the perl DBI or MYSQL to return the newly created ID, so that I dont have to re-query for the id itself?

thanks
Sam

In reply to Avoiding too many DB queries by seaver

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.