gofaster has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,

We have a web service using:

Redhat Linux AS3 Perl 5.8.5 Apache 1.3.10 mod_perl mod_soap SOAP::Lite <<our code>> DBI Apache::DBI DBD::Oracle OCI

Which talks to a database server over the local network (during testing), which is:

Redhat Linux AS3 Oracle Enterprise Database Server 9i

We have been doing performance testing, in which we send many SOAP requests to the web server (many per second), and measure the time it takes to handle each transaction. The average amount of time is less than 500ms. Unfortunately, every once in a while, we have a transaction that takes up to 20 seconds, or longer.

Using Apache::DProf, it looks like a lot of the real (wall clock) time is happening during:

DBI::connect_cached x 121 7.58s = (0.02 + 7.56)s DBD::Oracle::db::ping x 121 7.56s = (0.00 + 7.56)s DBD::_mem::common::DESTROY x 121 0.00s DBI::common::DESTROY x 242 0.00s DBI::common::FETCH x 121 0.00s DBI::db::prepare x 121 7.56s = (0.00 + 7.56)s DBD::Oracle::db::prepare x 121 7.56s = (0.00 + 7.56)s DBD::Oracle::st::_prepare x 121 7.56s

But the same view, of user+system time, shows almost no time there.

On the database side, a SQLNet trace shows that it is just waiting (during these long transactions) for the client side to finish the request.

All of the times within the database itself are sub-second.

Has anyone else experienced anything like this? Any idea what would cause about one transaction in a thousand to take upwards of 40x the average processing time?

Thanks for any ideas!

Replies are listed 'Best First'.
Re: Apache::DBI + DBD::Oracle performance problem
by etcshadow (Priest) on Mar 30, 2005 at 23:54 UTC
    Super short version: client thinks the server is slow, server thinks the client is slow. Sounds like a network problem to me. You might want to try doing some network diagnostics while experiencing these slow downs. There are many ways in which things could be messed up: bad hardware, bad wires, bad configurations. And it could be at the IP layer (if the server and client are not on the same LAN), or at the ethernet layer.

    Specific examples I've seen like this are: bad ethernet cables (sad but true), some sort of @#$%ed up arrangement of your switches (cycles or diamonds in your topology), ethernet interfaces renegotiating their speed/duplex, instead of just being locked down, bad routes, etc.

    Anyway, just a wild stab in the dark.

    ------------ :Wq Not an editor command: Wq

      Some other annoying network quirks that I've run into:

      • Ethernet collisions (okay, this one's easy... look for a segment in your trace that's unswitched, and see if the hub's blinking too much)
      • Network card on fire (look for the smoke)
      • Incorrect autonegotiation (one thinks it's full duplex, the other thinks half duplex)
      • Too many hops on an ethernet segment. (should not go through more than 3 repeaters... shouldn't be a problem on a switched network).
      • Poorly seated cables. (pull them all out, push them back in fully).
      • Crosstalk (or other inductance). (not typically an issue, unless you're running incorrectly spec'd cables, or near phone/power lines)

      More often than not, though, it's either a saturated network, or like etcshadow said, and it's a bad cable (poorly made, crushed, improper radius, nicked, etc.) or misconfigured. (bad hardware happens, but once you're burned in, it's fairly rare in my experience)

      If you can sniff the network on either end, I'd suggest doing it, and see if you can spot anything wrong. You also have to remember that you're not running a real time OS (very few people are, as they can cause worse overall performance in most situations), so you can't be assured that anything is going to happen within a certain time frame.

Re: Apache::DBI + DBD::Oracle performance problem
by etcshadow (Priest) on Mar 31, 2005 at 20:03 UTC
    Hmm... actually I made a little mistake reading your performance output, there... thought it was saying that each operation took 7.56 seconds (which is part of why I was thinking network problems). But it's actually saying that just the prepare phase was, and that is indicative of an entirely different problem: slow parses. You could have some issues with your v$sqlarea and/or your shared pool (either of which could interfere with your soft-parse), or you might have some problems with your optimizer, which I'd assume is cost-based, which means that it could be a problem with your statistics.

    So, here's some questions:

    • Do you bind the data in your sql? If not, you could be causing your sqlarea to get out of control, size-wise, and cause soft-parses (part of the prepare-phase) to take a really long time. However, if that were the case, exactly, you'd be more likely to see a steady degredation of your performance (not so much that just one query would hang forever, but all others were nice and snappy).
    • Do you regularly flush your shared pool? When you do flush your shared pool, do you see big chunks of it lingering around anyways? This can happen due to some problems analagous to how circular references can interfere with reference-counting garbage-collectors.
    • Which optimizer mode are you using (cost or rule)? If you are using the cost-based optimizer mode, then when/how often do you generate/regenerate statistics. How do you generate statistics? Problems with you statistics could (I imagine, in extreme cases) cause the optimizer to take crazy-long periods of time during a hard-parse (first parse of a query... the time at which the optimizer builds an execution plan for your query and stores it in the shared pool).
    • Do you have a DBA? This is really more of a question for a real DBA (which I am not... although I do know a good bit of random oracle administration trivia)
    ------------ :Wq Not an editor command: Wq
Re: Apache::DBI + DBD::Oracle performance problem
by dragonchild (Archbishop) on Mar 31, 2005 at 13:21 UTC
    Network problems are possible, but unlikely. I would attempt to see if you can replicate the problem consistently, writing a script that you can fire off and demonstrate the problem to anyone who cares.

    Once you do that, you will have all the information you need to solve the problem.

    Being right, does not endow the right to be rude; politeness costs nothing.
    Being unknowing, is not the same as being stupid.
    Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
    Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.