Before I offer my thoughts on your question I must first comment on one issue I see in your code, namely the use of bind params. Apologies if you already know this, but..

SQL queries should follow this form:

my $sth = $dbh->prepare('update data_set2 set data = ? where key2 = ?'); $sth->execute($val1, $key);
This allows you to gain several benefits, including proper escaping for your data, cachability of statement handles (i.e. using the same statement handle for multiple updates, cutting time because the DB server does not have to re-parse the SQL) and you don't get people complaining at you to use bind params ;)

Now, on to your question.

I have done several different styles of what you are trying to do, and the highest performance I have been able to get out of the process is by putting more intelligence in the SQL and less in the perl. If you can make the database do more work, the perl has to work with less and will therefore run faster. If you do not have a database that supports subselects and joins, this may be harder then it otherwise would be.

For instance, your example looks like 2 problems.
  1. creating new records for non existing
  2. updating records that already exist

The first can be handled by getting a list from the database of only those records that don't exist, i.e.

SELECT id_field, field_to_update FROM table_1 WHERE id_field NOT IN (SELECT join_id_field FROM table_2)
and the second can be handled with slightly more complicated SQL, like this:
SELECT src.id_field, src.field_to_update FROM table_1 src, table_2 dest WHERE src.id_field = dest.join_id_field AND src.field_to_update != dest.field_to_be_updated
This will get you a list of the records that need to be updated.

To take this a step further, if your database supports it you can even do the updates purely on the database side, although that query is much more difficult and I do not have time or inclination to figure it out for an example problem :)

Hope that helps


In reply to Re: PERL DB Optimization by Tuppence
in thread PERL DB Optimization by 3dbc

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.