in reply to Re: How can I improve the sql statements applying efficiency
in thread How can I improve the sql statements applying efficiency

roboticus:

This sounds a good idea, but my sql statement just look like this:

delete tbl_3 using a primary key insert tbl_3 ( about 200 fields ) select tbl_1's about 100 fields tbl_2's about 100 fields )

it's easy to split the DELETE statements, but it looks not so easy to transfer the insert statements into a file using the bcp format. Querying and then insert maybe the root cause for the slowness.

And the script I am using just use Sybase::CTlib to apply the sql batchs to the Sybase server. And each of my file is a TRAN, containing 30k delete-then-insert sql statements.

Thanks very much for your advice, I think maybe I should try to ran the script parallelly to see if it can save some time.

Replies are listed 'Best First'.
Re^3: How can I improve the sql statements applying efficiency
by Marshall (Canon) on Jul 24, 2010 at 19:30 UTC
    Did you try the previous suggestion of setting manual commit mode at the beginning of this 30K script and then insert a manual commit command at the end of it to commit all 30K statements at once? I guess each of the 4 files you have is taking about 5 hours each?

    I like the roboticus methodology of separating the inserts from the deletes, then creating a temp table for the deletes so that they can all be done in one statement - but it sounds like that this may not be so easy to implement. 5 hours to run 30K of what appear to be fairly straightforward SQL statements sounds way off what I would expect for performance. Anyway I remain curious as to commit performance. If you try this (should only take couple of lines at beginning and end of your file), please report back what happened.

    Update: I guess I didn't say why I like the roboticus methodology. Its not because of using batch command or performance, its because if you are able to do this, you have a clear list of stuff that is supposed to be there and a clear list of stuff that isn't supposed to be there (i.e. update #99 doesn't depend upon update #15). Reaching this level of clarity may be non-trivial, but if something goes wrong and it always does (you have to be a pessimist in these matters), recovery to a "known state" is easier.

    I may be reaching here, but it sounds like you probably have perhaps a short term and a longer term question: a) I have this pig performance thing that I inherited - how can I make it run faster within the framework that I have? and b) moving forward, how could I use Perl code to re-implement this process so that it "really rocks" performance wise? I don't know enough yet to talk about (b) yet. I'm not sure that you even want to go to (b)? Maybe getting this 20 hour thing to run in say 4-5 hours is good enough? I don't know how often you do this or what constitutes "good enough".

      Marshall

      In fact each of my file is a whole transaction since there is BEGIN TRAN at the beginning of the file and COMMIT TRAN at the end of the file. And I have tried to treat each statement as a transaction, but the efficiency seems almost the same.

      I don't get into this kind situation a lot, in daily work the script only process say 2k records, but this time I want to process all the records accumulated during the past 2 or 3 years. 4 to 5 hours should be good, but more than 20 hours is a little out of control.

        Try: improve SQL statement efficiency. I am confident that you will get a 4x-5x performance increase. That's what you need to go from 20 hours->5 hours or less. This won't achieve 15-20 minutes, but that's not what is needed. The idea as I understand it is to make a dramatic improvement in performance with the understanding that the performance won't be optimal.

        If you can "make it work" for this abnormal 2-3 year combined update, I would do so. Pursue other performance enhancements if need be (or not).

Re^3: How can I improve the sql statements applying efficiency
by roboticus (Chancellor) on Jul 24, 2010 at 20:41 UTC

    littlehorse:

    With a little more information, we might be able to provide a bit better guidance. A couple of unsubstantiated notes:

    • If the SQL is deleting and then inserting records with the same primary key, then it may be better to modify the script generating the SQL to use update statements.
    • If the data you're inserting always comes from other tables, it may be that the SQL is a particular operation record-by-record instead of treating the records as a set. For example, if you're updating the price column, then:
      update the_table set price=price*1.2 where material='iron' update the_table set price=price*1.1 where material='plastic'

      is much better than
      update the_table set price=17.54 where the_key=123 update the_table set price=1.50 where the_key=125 update the_table set price=15.00 where the_key=126 ...etc...
    • If you're performing different types of transformations, it may be better to operate on columns instead of rows. For example:
      update the_table set price=price*2 where material='iron' update the_table set drop_ship=1 where is_heavy=1 or is_large=1
      If you can provide a dozen or so records (sanitized, of course), we could look things over and see if we can offer suggestions.

    ...roboticus

      roboticus

      Thanks for your reply, but in my case using UPDATE statement is not good, since the changed fields vary for different records, I mean for the 1st record, maybe the changed fields is the first 3 ones, but for the 2nd record, maybe the changed fields are the last 3 ones. It will be tedious to find out which fields have changed for each record and then write out an update statement.

      For the 3 tables I have mentioned, the tbl_3 is almost a combine of the tbl_1 and tbl_2 on each primary key. Just like:

      tbl_1: key col1 col2 --- ---- ---- 1 abc 123 2 xyz 987 tbl_2: key col3 col4 --- ---- ---- 1 bcd 12.5 2 zzz 11.1 tbl_3: key col1 col2 col3 col4 --- ---- ---- ---- ---- 1 abc 123 bcd 12.5 2 xyz 987 zzz 11.1
      Inspired by your bcp advice, I am just thinking whether I can bcp out tbl_1 and tbl_2, and then process the two bcp files to construct the bcp file for tbl_3. It should be much faster then inserting the sql statements.