Perl DBI Performance

Evanovich has asked for the wisdom of the Perl Monks concerning the following question:

This is probably 50% an SQL question, and 50% a Perl question, but hopefully the monks will forgive me for my impurity :). I am trying to execute the following loop to update a Postgresql table:


for (1..$#{$columns}) {
 $sth = $dbh->prepare("UPDATE yeast1 set $tables[$_] = temp/$tables[$_
+-1] where yeast1.orf = temp.orf; INSERT INTO profile_index VALUES ('$
+tables[$_-1]', '$columns->[$_]')");
 $sth->execute;
 $sth = $dbh->prepare("UPDATE yeast1 set $tables[$_-1] = 999 where $ta
+bles[$_-1] IS NULL");
 $sth->execute;
}
[download]

Yeast1 is a large table: 6265 rows and about 100 columns so far, and growing to about 400. This loop presently takes hours to complete, and gets longer as I load more data into yeast1. Does anyone know how I can optimize this DBI query? thanks, Evan $sth = $dbh->prepare("UPDATE yeast1 set $tables$_-1 = 999

Comment on Perl DBI Performance Download Code

Replies are listed 'Best First'.
Re: Perl DBI Performance by runrig (Abbot) on Aug 18, 2001 at 04:21 UTC
See Re: A more elegant solution? and consider doing the same sort of thing for your insert AND update statements. All that preparing without placeholders is probably killing your performance.	[reply]
Re: Re: Perl DBI Performance by Evanovich (Scribe) on Aug 18, 2001 at 04:43 UTC
Right. Forgot to say: Postgresql doesn't support placeholders on tables. I don't think. Someone correct me if I'm wrong.	[reply]
Re: Re: Re: Perl DBI Performance by Trimbach (Curate) on Aug 18, 2001 at 05:34 UTC
Placeholders are emulated by the DBD::pg module. Check out the docs. I don't know if that'll solve your speed problems, but your table really isn't that big. I don't have experience with PostgreSQL, but I've done multiple updates on 500,000 row tables in mySQL and it's only taken about 10 minutes on decent hardware. Your poor performance may also be a function of the excessive amount of cols in your table. Maybe you should consider redesigning your database to use multiple tables and foreign keys. That might make things better. Gary Blackburn Trained Killer	[reply]
Re: Perl DBI Performance by perrin (Chancellor) on Aug 18, 2001 at 08:48 UTC
In addition to the comments about using placeholders, you should make sure autocommit is off. You might want to do a commit after every 1000 rows or so that you change. You'll need to experiment to see what interval works best. And check those return values! (Or turn on RaiseError.) You're walking around with your shoes untied.	[reply]
Re: Perl DBI Performance by htoug (Deacon) on Aug 19, 2001 at 18:59 UTC
I would be cautious about having two SQL-statements in the same prepare - it is against the DBI spec. Some drivers may allow it (it looks as if Postgres does), but they shouldn't. Note I have no Postgres expierience, so the following is just general RDB expierince I would look at the way the query optimizer handles the queries; a loop of 3 SQl statements executed 400 times should only take a few seconds - and does only take that amount of time in perl/DBI. The rest of the time is spent in your database, so it is there that you have to look for the optimisations. You are doing a large number of lookups based on the orf column in yeast1 - is there an index on that column? If not you probably have a table scan for every update - and that will take some time. Does the profile_index table have a structure that easily allows inserts? And all in all: you are making several hundred updates to the same row, one for each column. It will be several hundred times as fast to collect all these updates into one gigantic statement (if the DB can handle it). There is no reason to go overboard with placeholders and prepare_cached unless you have millions of prepares, just do it the simple way: build your statement with the values in place, prepare and execute it. That should be a couple of hundred times faster.	[reply]
Re: Perl DBI Performance by busunsl (Vicar) on Aug 20, 2001 at 10:40 UTC
I'm not that used to postgress and don't fully understand the where clause in the first update: `UPDATE yeast1 set $tables[$_] = temp/$tables[$_-1] where yeast1.orf = +temp.orf;` [download] Where does `temp.orf` come from. From a temoprary table? If it does, you should have an index on that table on the column orf. And you should have an index on yeast1(orf) anyway. Try to benchmark the sql, AFAIK postgres has some tool show what's going on, when a query runs (explain or such).	[reply] [d/l] [select]