...for Sybase the optimizer will use a generic weighting when it doesn't know the SARGS...
Well, yes, and that's exactly the problem. Generic weighting simply tells you that you have a highly-unselective index, and, thus, the index will never be used.
This is actually a common case with columns like "status" on task-tracking or ticket-tracking systems. If you've got a database with a million tasks, and nearly all of them are "COMPLETED", but the ones that you are typically interested in are "OPEN" (a small minority of the total population), then, bang, it's the exact issue I showed. You RDBMS will say "I don't know what value they want... but I know that the average value occupies half of the table. Accessing half of a table by index is a bad idea, so I'll full-table-scan."
And, yeah, we are getting away from perl somewhat, but the original post was really about database performance tuning, so I don't feel so bad answering in kind. Also, there are a lot of "grey areas" around perl that folks on this site deal with a lot: CGI, HTML, shell/command-line, file-system, and databases. If perl is commonly used as a glue between these pieces... the line starts to blur a little bit with respect to what is fair game for discussion.
------------
:Wq
Not an editor command: Wq
| [reply] |
This is actually a common case with columns like "status" on task-tracking or ticket-tracking systems.
I wanted to see how Sybase handles this case, so I did the following little test with the text_events table, which is used by Sybase to track what rows in source tables need to be updated in a Verity Full-Text index.
select count(*), event_status from text_events group by event_status;
event_status
----------- ------------
26 0
2352103 2
81921 3
Rows with event_status == 0 have to be processed, and there's an index on event_status.
I first ran a SQL snippet like this:
declare @status int
select @status = 0
select * from text_events where status = @status
And this did indeed do a table scan.
Then I tried a short DBI script, using placeholders:
my $sth = $dbh->prepare("select * from payment_db..text_events where e
+vent_status = ?");
$sth->execute(1);
fetch_all($sth);
$sth->execute(0);
fetch_all($sth);
In this case the index is used every time. Then I wrote an ad-hoc stored procedure to do the same thing, and I get a table scan. Which just goes to show that this sort of thing is a bit of a black art :-)
Michael
| [reply] [d/l] [select] |
analyze table <table name> compute statistics for all indexed columns
+size 254;
with generate a histogram of weightings for each range of values that are in the indexed columns. Parse trees can be generated on a case by case basis. In the latest versions you can "fake" the stats to get the optimizer to behave properly.
| [reply] [d/l] |
| [reply] [d/l] [select] |
Yes, I understand how histogram statistics work... my point was that if you bind a value which the CBO would be interested in... it is only going to come up with one execution plan for the query, regardless of which value it is executed with.
Or am I just misunderstanding what you are saying? Is there some hint I don't know of yet... something along the lines of: /*+ force_reparse_after_parameter_substitution */? If so, I'd love to hear it... because it would be really useful for me in a few situations. Situations where an application only allows for a query to be parameterized by bind values... and I want the query to be parametrizable, but not to suffer the hamstringing of the CBO that bind parameters are causing.
------------
:Wq
Not an editor command: Wq
| [reply] [d/l] |
You can always issue a alter system flush shared_pool which will force re-compliles (at the cost of performance), but the official Oracle line is to either use literals (which is what we don't want) or to force the use of an index with the /*+ INDEX(<analyzed index>) */ hint. Mind you, the official Oracle line is to use DBMS_STATS rather than analyze, but I've never managed to get it to compile stats on a large table in a reasonable amount of time (and I've never been able let it run long enough to complile stats with the "cascade" option - batch windows are such a constraint), where I count "reasonable time" as the time it takes "analyze" to complete the same operation, so when it comes to taking the Oracle line...the track record isn't terribly good.
| [reply] [d/l] [select] |