Re: How can I improve the sql statements applying efficiency
by roboticus (Chancellor) on Jul 24, 2010 at 06:13 UTC
|
littlehorse:
Each time you insert or delete a record, the database server has to ensure that indexes are properly maintained, along with a *lot* of housekeeping. Since each SQL statement has to be processed one at a time, most of the housekeeping is redundant and wasted.
If you want to get it to be as fast as possible, then you should split your file into two: The first one would simply be a formatted list of the records you want inserted into the database. Then you can insert the data using BCP (a bulk record loader).
The second file should simply be a list of primary keys that you want to delete. Create a temporary table to hold them, and use BCP to insert the keys into the list. Then you can delete all the records using a single SQL statement, something like:
delete the_table where the_key in (
select the_key from temporary_table
)
Then, of course, you can drop your temporary table.
With a reasonable database server and table definitions, I'd expect you to get the time to less than a half hour.
...roboticus
| [reply] [d/l] |
|
|
delete tbl_3 using a primary key
insert tbl_3
(
about 200 fields
)
select
tbl_1's about 100 fields
tbl_2's about 100 fields
)
it's easy to split the DELETE statements, but it looks not so easy to transfer the insert statements into a file using the bcp format.
Querying and then insert maybe the root cause for the slowness.
And the script I am using just use Sybase::CTlib to apply the sql batchs to the Sybase server. And each of my file is a TRAN, containing 30k delete-then-insert sql statements.
Thanks very much for your advice, I think maybe I should try to ran the script parallelly to see if it can save some time.
| [reply] [d/l] |
|
|
Did you try the previous suggestion of setting manual commit mode at the beginning of this 30K script and then insert a manual commit command at the end of it to commit all 30K statements at once? I guess each of the 4 files you have is taking about 5 hours each?
I like the roboticus methodology of separating the inserts from the deletes, then creating a temp table for the deletes so that they can all be done in one statement - but it sounds like that this may not be so easy to implement. 5 hours to run 30K of what appear to be fairly straightforward SQL statements sounds way off what I would expect for performance. Anyway I remain curious as to commit performance. If you try this (should only take couple of lines at beginning and end of your file), please report back what happened.
Update: I guess I didn't say why I like the roboticus methodology. Its not because of using batch command or performance, its because if you are able to do this, you have a clear list of stuff that is supposed to be there and a clear list of stuff that isn't supposed to be there (i.e. update #99 doesn't depend upon update #15). Reaching this level of clarity may be non-trivial, but if something goes wrong and it always does (you have to be a pessimist in these matters), recovery to a "known state" is easier.
I may be reaching here, but it sounds like you probably have perhaps a short term and a longer term question: a) I have this pig performance thing that I inherited - how can I make it run faster within the framework that I have? and b) moving forward, how could I use Perl code to re-implement this process so that it "really rocks" performance wise? I don't know enough yet to talk about (b) yet. I'm not sure that you even want to go to (b)? Maybe getting this 20 hour thing to run in say 4-5 hours is good enough? I don't know how often you do this or what constitutes "good enough".
| [reply] |
|
|
|
|
|
|
| [reply] [d/l] [select] |
|
|
Re: How can I improve the sql statements applying efficiency
by intel (Beadle) on Jul 24, 2010 at 02:17 UTC
|
well, let's see the script that you're using to read the SQL.
Efficiency in this case may not just be about the script your using, however. It could very well be how those SQL statements are structured, what they are doing, and how your data is set up in the DB. I've just got nothing to go on here to even begin to answer your question.
Please re-ask and I'd love to help.
How (Not) To Ask A Question | [reply] |
|
|
Since the script is not written by me, my purpose was to get some direction guide lines here, maybe I should dig into the script and come here again, thanks for your advice.
| [reply] |
Re: How can I improve the sql statements applying efficiency
by morgon (Priest) on Jul 24, 2010 at 02:29 UTC
|
This looks like a database issue - this has nothing to do with Perl.
You have to look at your database design (how many indices have to be updated for every row you delete or insert), how often do you commit in between (for every row, for every n rows - or only once?) etc etc.
Maybe you should also think about doing the processing not in one but in several parallel session, but all that depends on a lot of things you have told us nothing about...
| [reply] |
Re: How can I improve the sql statements applying efficiency
by Marshall (Canon) on Jul 24, 2010 at 04:48 UTC
|
One thing for you to try is adjusting the autocommit behavior. I found a link for you...
sybase autocommit. Read through this to figure out the Sybase method and lingo. Basically the DB does a lot of bookeeping to preserve integrity when modifying the DB. If you turn this off, performance will jump by a considerable factor at the cost of not being to recover if there is a crash while this huge script is running. In lots of scenarios that "risk" is just fine and you can just start completely over. This is something that I would do when say doing the initial creation of a million line table in MySQL (takes about 40 seconds this way and I don't care I have to delete the whole table and start completely over). Your mileage will vary. | [reply] |
Re: How can I improve the sql statements applying efficiency
by erix (Prior) on Jul 24, 2010 at 09:46 UTC
|
It's sometimes possible to rewrite such files into bulk load format for the bulk utility of the database (bcp, in the case of sybase). (Such bulk loading is typically much faster then executing sql statements.)
update: Nevermind, I see roboticus already mentioned this...
| [reply] |
Re: How can I improve the sql statements applying efficiency
by mpeppler (Vicar) on Aug 15, 2010 at 07:49 UTC
|
Like others have pointed out - this is a database issue, not a perl issue.
You need to look at the query plan generated for the delete and each of the select statements to ensure that these are as optimal as possible.
Then you may need to look at the server itself to see if you can tune the User Log Cache, and maybe bind a specific named cache to the database's log using 4k IO (i.e. two pages) as this can improve things as well. You also need to look at the actual disk IO subsystem to ensure that this is properly configured.
Michael | [reply] |