in reply to PERL DB Optimization
Before I offer my thoughts on your question I must first comment on one issue I see in your code, namely the use of bind params. Apologies if you already know this, but..
SQL queries should follow this form:
This allows you to gain several benefits, including proper escaping for your data, cachability of statement handles (i.e. using the same statement handle for multiple updates, cutting time because the DB server does not have to re-parse the SQL) and you don't get people complaining at you to use bind params ;)my $sth = $dbh->prepare('update data_set2 set data = ? where key2 = ?'); $sth->execute($val1, $key);
Now, on to your question.
I have done several different styles of what you are trying to do, and the highest performance I have been able to get out of the process is by putting more intelligence in the SQL and less in the perl. If you can make the database do more work, the perl has to work with less and will therefore run faster. If you do not have a database that supports subselects and joins, this may be harder then it otherwise would be.
For instance, your example looks like 2 problems.The first can be handled by getting a list from the database of only those records that don't exist, i.e.
and the second can be handled with slightly more complicated SQL, like this:SELECT id_field, field_to_update FROM table_1 WHERE id_field NOT IN (SELECT join_id_field FROM table_2)
This will get you a list of the records that need to be updated.SELECT src.id_field, src.field_to_update FROM table_1 src, table_2 dest WHERE src.id_field = dest.join_id_field AND src.field_to_update != dest.field_to_be_updated
To take this a step further, if your database supports it you can even do the updates purely on the database side, although that query is much more difficult and I do not have time or inclination to figure it out for an example problem :)
Hope that helps
|
|---|