banesong has asked for the wisdom of the Perl Monks concerning the following question:

I am the maintainer of a ActiveState/Windows based web application. The RDBMS that is used as the back end is MS SQL. My question is in regards to how the web front end talks to the back end database. Over the past 10 years, we have been using the DBD::ODBC module, and it works fine. What I would like to know, is there a better method to use? I want to reduce the connection/query time as much as possible, and am aware that ODBC is not the most efficient method. The architecture is fairly simple - two servers, one serving the UI, the other performing the database services. They are connected by a private gigE cable. Thanks in advance.

Replies are listed 'Best First'.
Re: Most efficient MS SQL connection
by roboticus (Chancellor) on Mar 02, 2010 at 17:10 UTC

    banesong:

    I wouldn't worry about it until database communications become a significant fraction of the time your application consumes. If communications takes 1% of the time, then improving communications speed by 90% will still make less than a 1% change.

    Go after the big payoffs.

    If your db communications *is* a signification fraction, then try other communication methods, such as DBD::Sybase and see which ones give you the best payoff.

    ...roboticus

      Thanks - I will say that DB communication is a significant portion of the time (aside from the few reports that need massive hash table manipulation, which is a whole other problem). I will look and see if DBD::Sybase will work any better.

        How did you ascertain that DB communication is a significant portion of the time? That is not my experience with DBD::ODBC and MS SQL Server with two exceptions which are a) when retrieving very large result-sets (and this could be improved by using array binding in DBD::ODBC but I've never had the impetus to do it myself) and b) when using dynamic or server-side cursors.

        I'd be most interested in your usage case if it really demonstrates a high proportion of the time in DB communication.

        Unlike some other people who claim using ODBC slows everything down (their usual reason is based on ODBC adds another layer over the TDS protocol and hence it must be slower) I've seldom seen an actual real life case with numbers that categorically proves adding ODBC makes a signifcant difference.

Re: Most efficient MS SQL connection
by BrowserUk (Patriarch) on Mar 02, 2010 at 19:20 UTC

    I'm not sure how useful this is, because it's been a long time since I did anything with MSSQL. But I do remember that in days gone by, SQL Server allowed local LAN segment connections via NamedPipe, and that these were significantly quicker than WAN connections via tcp.

    Whether that is still the case, possible in your environment, or exposed by DBI I have no idea, but it might be worth your time to find out.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      or exposed by DBI is not something you need to worry about since I believe the feature you are referring to is set in the ODBC datasource for the driver and does not need any specific support in DBI. Even if you are using DSN-less connections you should be able to find the SQL Server driver attributes for Named Pipes and add them to your connect call.

        Hmmmm. I'm not sure, as I tried to indicate, whatever little I knew on the subject is shrouded in the mists of my mind, but I think that going via the ODBC layer to get to the named pipe would largely, if not completely, negate any (performance) benefits of doing so.

        That said: I just found this on msdn, which tends to indicate that there is no advantage over tcpip by using the NamedPipe protocol connection on a fast lan:

        Named Pipes vs. TCP/IP Sockets

        In a fast local area network (LAN) environment, Transmission Control Protocol/Internet Protocol (TCP/IP) Sockets and Named Pipes clients are comparable with regard to performance. However, the performance difference between the TCP/IP Sockets and Named Pipes clients becomes apparent with slower networks, such as across wide area networks (WANs) or dial-up networks. This is because of the different ways the interprocess communication (IPC) mechanisms communicate between peers.

        For named pipes, network communications are typically more interactive. A peer does not send data until another peer asks for it using a read command. A network read typically involves a series of peek named pipes messages before it starts to read the data. These can be very costly in a slow network and cause excessive network traffic, which in turn affects other network clients.

        It is also important to clarify if you are talking about local pipes or network pipes. If the server application is running locally on the computer that is running an instance of SQL Server, the local Named Pipes protocol is an option. Local named pipes runs in kernel mode and is very fast.

        For TCP/IP Sockets, data transmissions are more streamlined and have less overhead. Data transmissions can also take advantage of TCP/IP Sockets performance enhancement mechanisms such as windowing, delayed acknowledgements, and so on. This can be very helpful in a slow network. Depending on the type of applications, such performance differences can be significant.

        TCP/IP Sockets also support a backlog queue. This can provide a limited smoothing effect compared to named pipes that could lead to pipe-busy errors when you are trying to connect to SQL Server.

        Generally, TCP/IP is preferred in a slow LAN, WAN, or dial-up network, whereas named pipes can be a better choice when network speed is not the issue, as it offers more functionality, ease of use, and configuration options.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Most efficient MS SQL connection
by dHarry (Abbot) on Mar 03, 2010 at 08:26 UTC

    is there a better method to use?

    That depends very much on the architecture of the application. If your queries are complex it takes time, you'll probably have to live with that. You could try to (further) optimize your queries, have you already tried that? Maybe you could do some batch processing, populating some of the tables at a better time, e.g. at night?

    It's not clear from your description how much processing is done by the web server. Fetching a lot of data from the database is a typical bottleneck leading to a lot of network traffic and long connect times. One solution is than to move more logic into the database, e.g. using Stored Procedures can often boost performance. Another solution is to change the architecture and put in another layer, e.g. add an application server

    My gut feeling is that the real problem is not the way you connect to the database but how the application is partitioned: what is done where.

    Cheers

    Harry

      The vast bulk of the logic is performed on the web front end (we have just recently migrated from a single server to a split architecture). Even with a fairly simple data retrieval and display (just grab rows from tables with no more than a single join and dump to the screen with almost no front end logic), it takes a disproportionate amount of time to retrieve results. I will agree that the larger performance hit comes with complexity of statements vice complexity of post processing logic. We are getting ready to deploy the new architecture to production shortly (finishing certification process), so I will have better 'real world' performance stats soon.

        Interesting, even when "there is not much to do" it appears to be slow. I think you have to do some measurements to find out where exactly the bottleneck is. If you execute the query from a SQL client does it take a long time too? One thing I can think of is the overhead introduced by initiating a connection. Does the application initiates a lot of (new) connections or does it reuse connections, i.e. uses connection pooling?

        Even with a fairly simple data retrieval and display (just grab rows from tables with no more than a single join and dump to the screen with almost no front end logic), it takes a disproportionate amount of time to retrieve results.

        You seem to be shying away from giving us an example of what you call proportionately slow. Dumping to the screen can take a long time as you are performing IO on a terminal and scrolling etc. Here is a concrete example to MS SQL Server via an ODBC driver on a remote machine (to the database) connected through a router and 100M ethernet. The machine running MS SQL Server is running SQL Server express and it was dumped even as a desktop it is so old and slow (would you call this slow):

        use DBI; use strict; use warnings; use Benchmark::Timer; use Data::Dumper; my $t = Benchmark::Timer->new; $t->start('main'); my $h = DBI->connect; if ($ARGV[0]) { print "Recreating table\n"; eval {$h->do(q/drop table mje/);}; $h->do(q/create table mje (a varchar(50), b varchar(50), c varchar +(50), d varchar(50))/); $h->begin_work; my $s = $h->prepare(q/insert into mje values(?,?,?,?)/); my $a = 'a' x 50; my $b = 'b' x 50; my $c = 'c' x 50; my $d = 'd' x 50; foreach (1..50000) { $s->execute($a, $b, $c, $d); } $h->commit; } $t->stop('main'); $t->start('fetch'); my $r = $h->selectall_arrayref(q/select * from mje/); $t->stop('fetch'); $t->start('dump'); print Dumper($r); $t->stop('dump'); print "Rows fetched:", scalar(@$r), "\n"; print $t->reports;

        which outputs

        many many lines like the following [ 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa', 'bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb', 'cccccccccccccccccccccccccccccccccccccccccccccccccc', 'dddddddddddddddddddddddddddddddddddddddddddddddddd' ] Rows fetched:50000 main1 trial of main (44.867ms total) fetch1 trial of fetch (1.715s total) dump1 trial of dump (31.614s total)

        I only keep harping on about this because I'm not sure you really have identified what is slow and you have not given us any detail to decide.