littlehorse has asked for the wisdom of the Perl Monks concerning the following question:

Hi All,

I have 4 files and each is about 240M, each containing DELETE and INSERT sql statements. I am using a script to read the sql statements and then apply them to the Sybase database, but it took more than 20 hours to finish. Any ideas how can I improve the efficiency?

Thanks in advance.

Update:

my sql statement just look like this:

delete tbl_3 using a primary key insert tbl_3 ( about 200 fields ) select tbl_1's about 100 fields tbl_2's about 100 fields

And the script I am using just use Sybase::CTlib to apply the sql batchs to the Sybase server. And each of my file is a TRAN, containing 30k delete-then-insert sql statements.

Replies are listed 'Best First'.
Re: How can I improve the sql statements applying efficiency
by roboticus (Chancellor) on Jul 24, 2010 at 06:13 UTC

    littlehorse:

    Each time you insert or delete a record, the database server has to ensure that indexes are properly maintained, along with a *lot* of housekeeping. Since each SQL statement has to be processed one at a time, most of the housekeeping is redundant and wasted.

    If you want to get it to be as fast as possible, then you should split your file into two: The first one would simply be a formatted list of the records you want inserted into the database. Then you can insert the data using BCP (a bulk record loader).

    The second file should simply be a list of primary keys that you want to delete. Create a temporary table to hold them, and use BCP to insert the keys into the list. Then you can delete all the records using a single SQL statement, something like:

    delete the_table where the_key in ( select the_key from temporary_table )

    Then, of course, you can drop your temporary table.

    With a reasonable database server and table definitions, I'd expect you to get the time to less than a half hour.

    ...roboticus

      roboticus:

      This sounds a good idea, but my sql statement just look like this:

      delete tbl_3 using a primary key insert tbl_3 ( about 200 fields ) select tbl_1's about 100 fields tbl_2's about 100 fields )

      it's easy to split the DELETE statements, but it looks not so easy to transfer the insert statements into a file using the bcp format. Querying and then insert maybe the root cause for the slowness.

      And the script I am using just use Sybase::CTlib to apply the sql batchs to the Sybase server. And each of my file is a TRAN, containing 30k delete-then-insert sql statements.

      Thanks very much for your advice, I think maybe I should try to ran the script parallelly to see if it can save some time.

        Did you try the previous suggestion of setting manual commit mode at the beginning of this 30K script and then insert a manual commit command at the end of it to commit all 30K statements at once? I guess each of the 4 files you have is taking about 5 hours each?

        I like the roboticus methodology of separating the inserts from the deletes, then creating a temp table for the deletes so that they can all be done in one statement - but it sounds like that this may not be so easy to implement. 5 hours to run 30K of what appear to be fairly straightforward SQL statements sounds way off what I would expect for performance. Anyway I remain curious as to commit performance. If you try this (should only take couple of lines at beginning and end of your file), please report back what happened.

        Update: I guess I didn't say why I like the roboticus methodology. Its not because of using batch command or performance, its because if you are able to do this, you have a clear list of stuff that is supposed to be there and a clear list of stuff that isn't supposed to be there (i.e. update #99 doesn't depend upon update #15). Reaching this level of clarity may be non-trivial, but if something goes wrong and it always does (you have to be a pessimist in these matters), recovery to a "known state" is easier.

        I may be reaching here, but it sounds like you probably have perhaps a short term and a longer term question: a) I have this pig performance thing that I inherited - how can I make it run faster within the framework that I have? and b) moving forward, how could I use Perl code to re-implement this process so that it "really rocks" performance wise? I don't know enough yet to talk about (b) yet. I'm not sure that you even want to go to (b)? Maybe getting this 20 hour thing to run in say 4-5 hours is good enough? I don't know how often you do this or what constitutes "good enough".

        littlehorse:

        With a little more information, we might be able to provide a bit better guidance. A couple of unsubstantiated notes:

        • If the SQL is deleting and then inserting records with the same primary key, then it may be better to modify the script generating the SQL to use update statements.
        • If the data you're inserting always comes from other tables, it may be that the SQL is a particular operation record-by-record instead of treating the records as a set. For example, if you're updating the price column, then:
          update the_table set price=price*1.2 where material='iron' update the_table set price=price*1.1 where material='plastic'

          is much better than
          update the_table set price=17.54 where the_key=123 update the_table set price=1.50 where the_key=125 update the_table set price=15.00 where the_key=126 ...etc...
        • If you're performing different types of transformations, it may be better to operate on columns instead of rows. For example:
          update the_table set price=price*2 where material='iron' update the_table set drop_ship=1 where is_heavy=1 or is_large=1
          If you can provide a dozen or so records (sanitized, of course), we could look things over and see if we can offer suggestions.

        ...roboticus

Re: How can I improve the sql statements applying efficiency
by intel (Beadle) on Jul 24, 2010 at 02:17 UTC
    well, let's see the script that you're using to read the SQL.
    Efficiency in this case may not just be about the script your using, however. It could very well be how those SQL statements are structured, what they are doing, and how your data is set up in the DB. I've just got nothing to go on here to even begin to answer your question. Please re-ask and I'd love to help. How (Not) To Ask A Question
      Since the script is not written by me, my purpose was to get some direction guide lines here, maybe I should dig into the script and come here again, thanks for your advice.
Re: How can I improve the sql statements applying efficiency
by morgon (Priest) on Jul 24, 2010 at 02:29 UTC
    This looks like a database issue - this has nothing to do with Perl.

    You have to look at your database design (how many indices have to be updated for every row you delete or insert), how often do you commit in between (for every row, for every n rows - or only once?) etc etc.

    Maybe you should also think about doing the processing not in one but in several parallel session, but all that depends on a lot of things you have told us nothing about...

Re: How can I improve the sql statements applying efficiency
by Marshall (Canon) on Jul 24, 2010 at 04:48 UTC
    One thing for you to try is adjusting the autocommit behavior. I found a link for you... sybase autocommit. Read through this to figure out the Sybase method and lingo. Basically the DB does a lot of bookeeping to preserve integrity when modifying the DB. If you turn this off, performance will jump by a considerable factor at the cost of not being to recover if there is a crash while this huge script is running. In lots of scenarios that "risk" is just fine and you can just start completely over. This is something that I would do when say doing the initial creation of a million line table in MySQL (takes about 40 seconds this way and I don't care I have to delete the whole table and start completely over). Your mileage will vary.
Re: How can I improve the sql statements applying efficiency
by erix (Prior) on Jul 24, 2010 at 09:46 UTC

    It's sometimes possible to rewrite such files into bulk load format for the bulk utility of the database (bcp, in the case of sybase). (Such bulk loading is typically much faster then executing sql statements.)

    update: Nevermind, I see roboticus already mentioned this...

Re: How can I improve the sql statements applying efficiency
by mpeppler (Vicar) on Aug 15, 2010 at 07:49 UTC
    Like others have pointed out - this is a database issue, not a perl issue.

    You need to look at the query plan generated for the delete and each of the select statements to ensure that these are as optimal as possible.

    Then you may need to look at the server itself to see if you can tune the User Log Cache, and maybe bind a specific named cache to the database's log using 4k IO (i.e. two pages) as this can improve things as well. You also need to look at the actual disk IO subsystem to ensure that this is properly configured.

    Michael