knoebi has asked for the wisdom of the Perl Monks concerning the following question:

At my job I have to maintain many different databases (most of them are MySQL, Postgres). Also with different shemas and applications running on them. Unfortunally many tables don't have enough indexes, some of the databases don't have indexes at all, which is really a bad situation at the moment and pumping the load up.
So I'am looking for a solution to autogenerate the indexes, I thought about the following ideas:

1) directly interact with the DBI (most of the applications are written in perl, lucky i'am) and analyze the queries for joins, where, orders and generate the index for the columns.

2) doing the same as in 1, but just log all the queries on the database level and parse the log file.

well, I'am not sure but I think this could be a problem which is maybe common: often the database and the application people are not the same and the applications get changed all the time. so, probably somebody allready came up with a more flexible solution?

Some of my wishes:
-database independant (or at least portable)
-priority for indexes (like how often the potential index could be used)

Thank you for your time you needed to read this post and specially for your replys.

Replies are listed 'Best First'.
Re: Autogenerate SQL Indexes?
by herveus (Prior) on Aug 24, 2004 at 11:45 UTC
    Howdy!

    Danger! Danger!

    Before you go adding indexes willy-nilly, you have to have some understanding of the data and how it is being used. Indexes do not automatically give you increased performance. In fact, in many cases, indexes can slow things down.

    Is the database transaction heavy or no? Having many indexes in a database that has heavy insert/update/delete activity can really bit you, as those indexes must be maintained in parallel with the table changes.

    On the other hand, if the data is relatively static, used mostly for queries, carefully crafted indexes can make a dramatic improvement in performance. Poorly crafted indexes could do little but consume space.

    Another factor you must keep in mind is how the query optimizer works for the DBMS in question. Indexes need to be crafted (there's that word again) with that in mind, if they are to be effective.

    Proper indexing is something of an art; it's one of those things that help keep database administrators occupied. As access patterns change indexes may need changing. I'll be bold and claim that you aren't going to find the general solution you seek. Sorry about that.

    yours,
    Michael
      I agree with herveus. Indexes always doesnot suit all tables in the database (DB).
      Since you said you have the log file at hand .
      Find out the tables that have large amount of data , in here if the data is only inserted or queried , you can go ahead and index them .If update/Delete happen too often indexing would not be such a great thing as the indexes are built each time you update the DB.
      indexing is a tricky thing , you have to observe how ur DB behaves in production environment . indexing should be carried out only (querying)fetching is more.
      Regards, Swaroop
Re: Autogenerate SQL Indexes?
by tilly (Archbishop) on Aug 24, 2004 at 15:54 UTC
    As others said, this is dangerous to automate. But you can simplify your life if you rely on the 80/20 rule. (80% of the benefit can be had for 20% of the work.)

    The principle is simple. The practice is harder. The principle is that you want to turn on profiling to be able to identify your top performance hogs. One by one try to optimize them. After a few hotspots are improved, your database should be happier. Repeat every time your load is becoming a problem again.

    I don't know too much about what profiling tools exist for postgres and mysql, however in a quick google I turned up this article that describes one which may do the trick. You probably want to look for more on this. (In particular I know that Oracle has a ton of tools for monitoring different kinds of stresses that queries put on the database. For instance you can identify what queries take the most disk accesses, internal locking, CPU time, etc...)

Re: Autogenerate SQL Indexes?
by suaveant (Parson) on Aug 24, 2004 at 14:35 UTC
    I must agree with herveus, indexes should really be crafted to the situation.

    That being said, you could analyze your queries for what is being joined on in tables to build yourself a list of candidates, then address them each by hand. That would at least give you a starting point, I suppose.

    Be warned, however, I am not sure if there are any kind of CPAN modules, but having written my own code to parse SQL (poorly) in the past, it is not for the faint of heart, especially if you a going cross-platform.

    Write something and use it as a guide to narrow your options, but make the changes yourself, and test them. I have made slight changes to MySQL and actually slowed down a query because the optimizer got confused. EXPLAIN is your friend.

    HINTS:

    You may have to put something into DBI. If the queries don't specify what database the tables are in explicitly, and you have tables of the same name in different databases, then you will need a preliminary query to get your context.

    Parse the from, get a list of tables and their aliases (if any), describe the tables into a hash for lookup if you have fields that don't have a table specified.

    Parse through the where, check any comparitors that are simple field to field comparisons and ignore any others, this will get you most of your key situations and simplify your life.

    Keep track of how many times a table is accessed and how many times each field in that table is accessed, if you have 20 accesses to a table, but a field is accessed only once in those 20, it is less likely to need keying.

    Good Luck.

                    - Ant
                    - Some of my best work - (1 2 3)