Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re: Re: Use Placeholders. For SECURITY and (sometimes) for PERFORMANCE

by mpeppler (Vicar)
on Nov 14, 2003 at 22:19 UTC ( [id://307228]=note: print w/replies, xml ) Need Help??


in reply to Re: Use Placeholders. For SECURITY and (sometimes) for PERFORMANCE
in thread Use Placeholders. For SECURITY and (sometimes) for PERFORMANCE

There are usually ways to trick the RDBMS query optimizer into doing the "right" thing, even with border case queries like the one you show. I don't know about Oracle, but for Sybase the optimizer will use a generic weighting when it doesn't know the SARGS (search arguments) or when the SARGS are the result of functions or local variables (as is the case for bound variables via placeholders). If the index histogram is reasonable then the generated query plan is probably going to be quite reasonable - after all it's the same thing that happens when using stored procedures, and they work really well (with Sybase).

Michael

PS - we're drifting quite a long way from perl here...

  • Comment on Re: Re: Use Placeholders. For SECURITY and (sometimes) for PERFORMANCE

Replies are listed 'Best First'.
Re: Re: Re: Use Placeholders. For SECURITY and (sometimes) for PERFORMANCE
by etcshadow (Priest) on Nov 14, 2003 at 23:09 UTC
    ...for Sybase the optimizer will use a generic weighting when it doesn't know the SARGS...

    Well, yes, and that's exactly the problem. Generic weighting simply tells you that you have a highly-unselective index, and, thus, the index will never be used.

    This is actually a common case with columns like "status" on task-tracking or ticket-tracking systems. If you've got a database with a million tasks, and nearly all of them are "COMPLETED", but the ones that you are typically interested in are "OPEN" (a small minority of the total population), then, bang, it's the exact issue I showed. You RDBMS will say "I don't know what value they want... but I know that the average value occupies half of the table. Accessing half of a table by index is a bad idea, so I'll full-table-scan."

    And, yeah, we are getting away from perl somewhat, but the original post was really about database performance tuning, so I don't feel so bad answering in kind. Also, there are a lot of "grey areas" around perl that folks on this site deal with a lot: CGI, HTML, shell/command-line, file-system, and databases. If perl is commonly used as a glue between these pieces... the line starts to blur a little bit with respect to what is fair game for discussion.


    ------------
    :Wq
    Not an editor command: Wq
      This is actually a common case with columns like "status" on task-tracking or ticket-tracking systems.
      I wanted to see how Sybase handles this case, so I did the following little test with the text_events table, which is used by Sybase to track what rows in source tables need to be updated in a Verity Full-Text index.
      select count(*), event_status from text_events group by event_status; event_status ----------- ------------ 26 0 2352103 2 81921 3
      Rows with event_status == 0 have to be processed, and there's an index on event_status.

      I first ran a SQL snippet like this:

      declare @status int select @status = 0 select * from text_events where status = @status
      And this did indeed do a table scan.

      Then I tried a short DBI script, using placeholders:

      my $sth = $dbh->prepare("select * from payment_db..text_events where e +vent_status = ?"); $sth->execute(1); fetch_all($sth); $sth->execute(0); fetch_all($sth);
      In this case the index is used every time. Then I wrote an ad-hoc stored procedure to do the same thing, and I get a table scan. Which just goes to show that this sort of thing is a bit of a black art :-)

      Michael

Re: Re: Re: Use Placeholders. For SECURITY and (sometimes) for PERFORMANCE
by rdfield (Priest) on Nov 14, 2003 at 23:45 UTC
    In Oracle
    analyze table <table name> compute statistics for all indexed columns +size 254;
    with generate a histogram of weightings for each range of values that are in the indexed columns. Parse trees can be generated on a case by case basis. In the latest versions you can "fake" the stats to get the optimizer to behave properly.

    rdfield

      With Sybase you'd use the optdiag tool to get information about index statistics and index distribution, and to load simulated index statistics.

      To see query plans you use set showplan on, and dbcc traceflags 302 and 310 are used to get a full explanation of the choices made by the optimizer (very useful when the optimizer decides to do things that seem very counter-intuitive to you).

      Michael

      Yes, I understand how histogram statistics work... my point was that if you bind a value which the CBO would be interested in... it is only going to come up with one execution plan for the query, regardless of which value it is executed with.

      Or am I just misunderstanding what you are saying? Is there some hint I don't know of yet... something along the lines of: /*+ force_reparse_after_parameter_substitution */? If so, I'd love to hear it... because it would be really useful for me in a few situations. Situations where an application only allows for a query to be parameterized by bind values... and I want the query to be parametrizable, but not to suffer the hamstringing of the CBO that bind parameters are causing.


      ------------
      :Wq
      Not an editor command: Wq
        You can always issue a alter system flush shared_pool which will force re-compliles (at the cost of performance), but the official Oracle line is to either use literals (which is what we don't want) or to force the use of an index with the /*+ INDEX(<analyzed index>) */ hint. Mind you, the official Oracle line is to use DBMS_STATS rather than analyze, but I've never managed to get it to compile stats on a large table in a reasonable amount of time (and I've never been able let it run long enough to complile stats with the "cascade" option - batch windows are such a constraint), where I count "reasonable time" as the time it takes "analyze" to complete the same operation, so when it comes to taking the Oracle line...the track record isn't terribly good.

        rdfield

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://307228]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (4)
As of 2024-04-25 17:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found