in reply to Improving on MS-SQL over DBI and DBD::ODBC

Usually, database program slowness is due to the database (i.e. the SQL code executing on the server) and not on the access method - so the first thing I'd do is verify that your SQL statements perform correctly (correct query plans, good index usage, etc).

The second thing, given a slow network connection, is to keep as much of the operations on the server. Avoid pulling data from the server for processing in perl and then reinserting the data on to the server - instead write SQL to do this so that the data stays on the server.

Michael

  • Comment on Re: Improving on MS-SQL over DBI and DBD::ODBC

Replies are listed 'Best First'.
Re^2: Improving on MS-SQL over DBI and DBD::ODBC
by bprew (Monk) on Nov 24, 2004 at 16:33 UTC

    I would agree with mpeppler. In my experience, you can usually make the application perform more work on the database, and this can reduce the amount of data you're pushing around, especially over an ISDN line.

    Also, I don't know how quickly you can connect to the database, but I've seen new connections to an oracle database take up to 20 seconds.

    So if you have multiple connections in your program, it might be worthwhile to "cache" the database handle somewhere and then only connect once. Unfortunately, I don't know much about the application overall, so its hard to suggest architectural changes when one doesn't know the architecture to begin with :).

    Finally, most of the time I find myself attempting to optimize something that's not a problem. Do you have mathmatical evidence (ie hard numbers) that show that the ODBC driver is solely the cause of your performance issues? I'm not saying that you're not correct, but I just want to make sure you're not trying to solve the wrong problem.


    --
    Ben
      Also, I don't know how quickly you can connect to the database, but I've seen new connections to an oracle database take up to 20 seconds.
      MS-SQL (and Sybase) is pretty good in that respect - creating a new connection is a relatively fast operation, though that doesn't invalidate your point - reconnecting over and over is not a good thing (especially over a slow link)

      Michael

      Hm, thank you for the SQL construction suggestions. I have tried to push as much as possible onto the server, but I'm not an SQL expert; perhaps I should find one to take a look at my statements. Unfortunately, I'm not sure there's a whole lot of optimization I can do in this manner, as a large portion of my work is to get DB records and compare them with text files or other data sources.

      I don't ever open multiple connections, except when there are multiple instances of my application (which can't be helped), so that isn't the cause of my woes. One db handle is enough, thank you. ;-)

      I can't prove that it's the ODBC driver that's the problem. I do know that performing an identical query using a third-party SQL tool is much faster (even when I account for connection times). I also know that native access methods are often faster as opposed to ODBC, so I'm hoping to find a DBD driver that uses MS-SQL natively. I can at least benchmark, then.

      Finally, while my performance is "acceptable" to my users and my employer, I consider it borderline; I'm wanting to refactor before adding yet more DB things.

      I wish I could just post the code, but NDA and all that... so please ask me anything you need to help you formulate a starting point for me to research.


      radiantmatrix
      require General::Disclaimer;
      Perl is

        Fetching 50k rows over a slow connection is bound to be slow, no matter what you do. I just tried this - pulling 50k rows from a pretty fast server (4 way Xeon, relatively lightly loaded) over my DSL line that achieves approximately 80KB/second, and just grabbing the data (with a pure C program) and a SQL query that just fetches the first rows (no WHERE clause, no ORDER BY) takes almost two minutes.

        In this case I think your major issue is the amount of data that you have to pull over the slow connection - I'd try to find ways to reduce that - maybe some client-side caching if the same data can be re-used, or a better/more specific WHERE clause to avoid fetching data that is not relevant.

        Michael

        Hmm... I have found in the past that you can make some good strides by looking at not only the SQL statement, but how its used in the context of the system.

        The real question is, where are you spending your time? Is it in the SQL query itself, in the record fetching, in the comparison? If you're making a lot of small queries, perhaps there is a way to combine them into a few larger queries.

        It sounds like you're fetching a fair amount of rows (60k+). Is there a way you can reduce the number of rows you're grabbing, perhaps by caching the data on the client side, so you don't have to grab all the data every time? Or allow the user to filter the data better, so they don't have to look at as much data?

        Without knowing some of the business requirements and more about the environment, its hard to make pointed suggestions, but hopefully it can give you something to think about.


        --
        Ben