Re^3: How can I improve the sql statements applying efficiency

Did you try the previous suggestion of setting manual commit mode at the beginning of this 30K script and then insert a manual commit command at the end of it to commit all 30K statements at once? I guess each of the 4 files you have is taking about 5 hours each?

I like the roboticus methodology of separating the inserts from the deletes, then creating a temp table for the deletes so that they can all be done in one statement - but it sounds like that this may not be so easy to implement. 5 hours to run 30K of what appear to be fairly straightforward SQL statements sounds way off what I would expect for performance. Anyway I remain curious as to commit performance. If you try this (should only take couple of lines at beginning and end of your file), please report back what happened.

Update: I guess I didn't say why I like the roboticus methodology. Its not because of using batch command or performance, its because if you are able to do this, you have a clear list of stuff that is supposed to be there and a clear list of stuff that isn't supposed to be there (i.e. update #99 doesn't depend upon update #15). Reaching this level of clarity may be non-trivial, but if something goes wrong and it always does (you have to be a pessimist in these matters), recovery to a "known state" is easier.

I may be reaching here, but it sounds like you probably have perhaps a short term and a longer term question: a) I have this pig performance thing that I inherited - how can I make it run faster within the framework that I have? and b) moving forward, how could I use Perl code to re-implement this process so that it "really rocks" performance wise? I don't know enough yet to talk about (b) yet. I'm not sure that you even want to go to (b)? Maybe getting this 20 hour thing to run in say 4-5 hours is good enough? I don't know how often you do this or what constitutes "good enough".

Comment on Re^3: How can I improve the sql statements applying efficiency

Replies are listed 'Best First'.
Re^4: How can I improve the sql statements applying efficiency by littlehorse (Sexton) on Jul 25, 2010 at 07:09 UTC
Marshall In fact each of my file is a whole transaction since there is BEGIN TRAN at the beginning of the file and COMMIT TRAN at the end of the file. And I have tried to treat each statement as a transaction, but the efficiency seems almost the same. I don't get into this kind situation a lot, in daily work the script only process say 2k records, but this time I want to process all the records accumulated during the past 2 or 3 years. 4 to 5 hours should be good, but more than 20 hours is a little out of control.	[reply]
Re^5: How can I improve the sql statements applying efficiency by Marshall (Canon) on Jul 25, 2010 at 09:39 UTC
Try: improve SQL statement efficiency. I am confident that you will get a 4x-5x performance increase. That's what you need to go from 20 hours->5 hours or less. This won't achieve 15-20 minutes, but that's not what is needed. The idea as I understand it is to make a dramatic improvement in performance with the understanding that the performance won't be optimal. If you can "make it work" for this abnormal 2-3 year combined update, I would do so. Pursue other performance enhancements if need be (or not).	[reply]